Upgrading and Repairing PCs Free Open Book

Upgrading and Repairing PCs

Previous Section Next Section

Troubleshooting Tips and Techniques

This section lists basic and general system troubleshooting procedures and guidelines. For specific procedures for troubleshooting a component in the system, use Appendix C, "Troubleshooting Index," as a quick reference for finding the chapter or section dedicated to that part of the PC.

Basic Troubleshooting Guidelines

Troubleshooting PC hardware problems can seem daunting to the uninitiated, but in reality it is much simpler than it seems. Most problems can be diagnosed and corrected using few, if any, special tools and can be accomplished by anybody who can apply simple deductive reasoning and logical thinking. PCs have become more complicated and yet simpler all at the same time. More and more complex internal circuits mean that there are potentially more things that can go wrong—more ways the system can fail. On the other hand, today's complex circuits are embedded into fewer boards, with fewer chips on each board and more serial interconnections using fewer pins (fewer wires). The internal consolidation means that isolating which replaceable component has failed is in many ways simpler than ever before. An understanding of the basics of how PCs work, combined with some very simple tools, some basic troubleshooting tips, and logical thinking and common sense, will enable you to effectively diagnose and repair your own systems, saving a tremendous amount of money over taking it to a shop. In some cases, you can save enough money to practically pay for an entire new system. The bottom line with troubleshooting PC problems is that a solution exists for every problem, and through simple practices combined with deductive reasoning, that solution can easily be found.

Modern PCs—More Complicated and More Reliable

Consider this: The modern PC is an incredible collection of hardware and software. Focusing specifically on the hardware, between 50 and more than 400 million transistors exist in modern processors (see the following note). In addition, nearly 4.3 billion transistors are in 512MB of RAM; hundreds of millions of transistors exist in the motherboard chipset, video processor, and video RAM; and millions more are in the other adapter cards or logic boards in the system. Each of these billions of interconnected transistors must not only function properly, but also operate in an orderly fashion within strictly enforced timing windows, some of which are measured in picoseconds (trillionths of a second). When you realize that your PC will lock up or crash if any one of these transistors fails to operate properly and on time—and/or any one of the billions of circuit paths and interconnections between the transistors or devices containing them fails in any way—it is a wonder that PCs work at all!

Note

There are 77 million transistors in the Pentium M (code named Banias), 55 million transistors in the Pentium 4 (code named Northwood), and more than 54 million in the latest Athlon XP (code named Barton). In the high-end server/workstation processor market, the 100-million-transistor mark was breached by a single-die CPU on May 22, 2000, when Intel introduced a 700MHz version of the Pentium III Xeon (code named Cascades) with 2MB of on-die L2 cache and 140 million transistors. This chip was built using older 0.18-micron technology and has an enormous die size of 385 sq.mm. That's more than 19.6mm on each side, or nearly three times the size of the current Pentium 4 die of 131 sq.mm.

The 1GHz Itanium 2 (code named McKinley) processor was introduced on July 8, 2002, and includes 32KB L1, 256KB L2, and up to 3MB of L3 cache integrated into a die containing a whopping 221 million transistors. This chip also uses older 0.18-micron technology and is the biggest I've ever heard of for a processor at 421 sq.mm (more than 20.5mm square), which is more than 3.2 times the size of the current Pentium 4.

Finally, the Madison-based 1.5GHz Itanium 2 with 6MB of integrated L3 cache includes an incredible 410 million transistors on a 374 sq.mm (more than 19.3mm square) die using newer 0.13-micron technology. It has established new records for transistor count as well as the amount of on-die cache.

Although these server/workstation processors are extreme, the technology used in them eventually filters down to the desktop and mobile processors. We will likely see the 1-billion-transistor mark breached by the follow-ons to these higher-end server chips during the next-generation 90-nanometer process lifecycle.

Every time I turn on one of my systems and watch it boot up, I think about the billions upon billions of components and trillions upon trillions of machine/program steps and sequences that have to function properly to get there. As you can now see, many opportunities exist for problems to arise.

Although modern PCs are exponentially more complicated than their predecessors, from another point of view they have become simpler and more reliable. When you consider the complexity of the modern PC, it is not surprising that problems occasionally do arise. However, modern design and manufacturing techniques have made PCs more reliable and easier to service despite their ever-increasing internal complexity. Today's systems have fewer and fewer replaceable components and individual parts, which is a bit of a paradox. The truth is that, as PCs have become more complex, they have also become simpler and easier to service in many ways.

Industry-Standard Replaceable Components

The use of industry-standard components is one of the key features of a PC. This means that virtually all the parts that make up a system are interchangeable with other systems in some manner. This also means that the parts are plentiful, inexpensive, and generally very easy to install. A typical PC typically contains the following replaceable components, most of which are made to industry standards for design and form factor:

  • Motherboard

  • Processor

  • CPU heatsink/fan

  • RAM

  • CMOS battery

  • Chassis with optional fan

  • Power supply

  • Video card[*]

  • Monitor

  • Sound card[*]

  • Speakers

  • Network card[*]

  • Hard drive

  • CD-ROM/RW drive

  • DVD-ROM/+RW drive

  • Floppy drive

  • Drive cables

  • Keyboard

  • Mouse

[*] May be integrated into the motherboard in some systems

Although some of the more well-optioned systems might have even more components than listed here, you can see that most PCs have fewer than 20 replaceable "parts." Some can have as few as 10–15, depending on how many options are present and how they are integrated. From a hardware troubleshooting or repair perspective, one of these components is either improperly installed (configured) or defective. If it's improperly installed or configured, the component can be repaired by merely reinstalling it or configuring it properly. If it's truly defective, the component must simply be replaced. When a PC is broken down to the basic replacable parts, you can see that it really isn't that complicated, which is why I've spent my career helping people to easily perform their own repairs or upgrades and even build entire systems from scratch.

Reinstall or Replace?

When dealing with hardware problems, the first simple truth to understand is that you do not usally repair anything—you reinstall or replace it instead. You reinstall because the majority of PC hardware problems are caused by a particular component being improperly installed or configured. I remember hearing from IBM many years ago that it had found that 60% or more of the problems handled by its service technicians were due to improper installation or configuration, meaning the hardware was not actually defective. This was, in fact, the major impetus behind the plug-and-play revolution, which has eliminated the need to manually configure jumpers and switches on most hardware devices. This has thus minimized the expertise necessary to install hardware properly and has also minimized installation, configuration, and resource conflict problems. Still, plug and play has sometimes been called plug and pray because it does not always work perfectly, sometimes requiring manual intervention to make work properly.

You replace because of the economics of the situation with computer hardware. The bottom line is that it financially is much cheaper to replace a failed circuit board with a new one than to repair it. For example, you can purchase a new, state-of-the-art motherboard for around $100, but repairing an existing board normally costs much more than that. Modern boards use surface-mounted chips that have pin spacings measured in hundredths of an inch, requiring sophisticated and expensive equipment to attach and solder the chip. Even if you could figure out which chip had failed and had the equipment to replace it, the chips themselves are usually sold in quantities of thousands and obsolete chips are usually not available. The net effect of all of this is that the replacable components in your PC have become disposable technology. Even a component as large and comprehensive as the motherboard is replaced rather than repaired.

Troubleshooting by Replacing Parts

You can troubleshoot a PC in several ways, but in the end it often comes down to simply reinstalling or replacing parts. That is why I normally use a simple "known-good spare" technique that requires very little in the way of special tools or sophisticated diagnostics. In its simplest form, say you have two identical PCs sitting side by side. One of them has a hardware problem; in this example let's say the memory module (DIMM) is defective. Depending on how and where the defect lies, this could manifest itself in symptoms ranging from a completely dead system to one that boots up normally but crashes when running Windows or software applications. You observe that the system on the left has the problem but the system on the right works perfectly—they are otherwise identical. The simplest technique for finding the problem would be to swap parts from one system to another, one at a time, retesting after each swap. At the point when the DIMMs were swapped, upon powering up and testing (in this case testing is nothing more than allowing the system to boot up and run some of the installed applications), the problem has now moved from one system to the other. Knowing that the last item swapped over was the DIMM, you have just identified the source of the problem! This did not require an expensive ($2,000 or more) DIMM test machine or any diagnostics software. Because components such as DIMMs are not economical to repair, replacing the defective DIMM would be the final solution.

Although this is very simplistic, it is often the quickest and easiest way to identify a problem component as opposed to specifically testing each item with diagnostics. Instead of having an identical system standing by to borrow parts from, most technicians have an inventory of what they call "known-good spare" parts. These are parts that have been previously used, are known to be functional, and can be used to replace a suspicious part in a problem machine. However, this is different from new replacement parts because, when you open a box containing a new component, you really can't be 100% sure that it works. I've been in situations in which I've had a defective component and replaced it with another (unknown to me) defective new component and the problem remained. Not knowing that the new part I just installed was also defective, I wasted a lot of time checking other parts that were not the problem. This technique is also effective because so few parts are needed to make up a PC and the known-good parts don't always have to be the same (for example, a lower-end video card can be substituted in a system to verify that the original card had failed).

Troubleshooting by the Bootstrap Approach

Another variation on this theme is the "bootstrap approach," which is especially good for what seems to be a dead system. In this approach, you take the system apart to strip it down to the bare minimum necessary, functional components and test it to see whether it works. For example, you might strip down a system to the chassis/power supply, bare motherboard, CPU (with heatsink), one bank of RAM, and a video card with display and then power it up to see whether it works. In that stripped configuration, you should see the POST or splash (logo) screen on the display, verifying that the motherboard, CPU, RAM, video card, and display are functional. If a keyboard is connected, you should see the three LEDs (capslock, scrlock, and numlock) flash within a few seconds after powering on. This indicates that the CPU and motherboard are functioning because the POST routines are testing the keyboard. After you get the system to a minimum of components that are functional, you should reinstall or add one part at a time, testing the system each time you make a change to verify it still works and that the part you added or changed is not the cause of a problem. Essentially, you are rebuilding the system from scratch using the existing parts, but doing it one step at a time.

Many times problems are caused by corrosion on contacts or connectors, so the mere act of disassembling and reassembling a PC will "magically" repair it. Over the years, I've disassembled, tested, and reassembled many systems only to find no problems after the reassembly. How can merely taking it apart and reassembling repair a problem? Although it might seem that nothing was changed and everything is installed exactly like it was before, in reality simply unplugging and replugging renews all the slot and cable connections between devices, which is often all the system needs. Some useful troubleshooting tips include

  • Eliminate unnecessary variables or components that are not pertinent to the problem.

  • Reinstall, reconfigure, or replace only one component at a time.

  • Test after each change you make.

  • Keep a detailed record (write it down) of each step you take.

  • Don't give up! Every problem has a solution.

  • If you hit a roadblock, take a break or work on another problem. A fresh approach the next day often reveals things you overlooked.

  • Don't overlook the simple or obvious. Double- and triple-check the installation and configuration of each component.

  • Keep in mind that the power supply is one of the most failure-prone parts in a PC, as well as one of the most overlooked components. A high-output "known-good" spare power supply is highly recommended to use for testing suspect systems.

  • Cables and connections are also a major cause of problems, so keep replacements of all types on hand.

Before starting any system troubleshooting, a few basic steps should be performed to ensure a consistent starting point and to enable isolating the failed component:

  1. Turn off the system and any peripheral devices. Disconnect all external peripherals from the system, except for the keyboard and video display.

  2. Make sure the system is plugged in to a properly grounded power outlet.

  3. Make sure the keyboard and video displays are connected to the system. Turn on the video display, and turn up the brightness and contrast controls to at least two-thirds of the maximum. Some displays have onscreen controls that might not be intuitive. Consult the display documentation for more information on how to adjust these settings. If you can't get any video display but the system seems to be working, try moving the card to a different slot (not possible with AGP adapters) or try a different video card or monitor.

  4. To enable the system to boot from a hard disk, make sure no floppy disk is in the floppy drive. Or put a known good bootable floppy with DOS or diagnostics on it in the floppy drive for testing.

  5. Turn on the system. Observe the power supply, chassis fans (if any), and lights on either the system front panel or power supply. If the fans don't spin and the lights don't light, the power supply or motherboard might be defective.

  6. Observe the power on self test (POST). If no errors are detected, the system beeps once and boots up. Errors that display onscreen (nonfatal errors) and that do not lock up the system display a text message that varies according to BIOS type and version. Record any errors that occur and refer to the DVD accompanying this book for a list of BIOS error codes for more information on any specific codes you see. Errors that lock up the system (fatal errors) are indicated by a series of audible beeps. Refer to the DVD for a list of beep error codes.

  7. Confirm that the operating system loads successfully.

Note

The Technical Reference section of the DVD accompanying this book contains an exhaustive listing of BIOS error codes, error messages, and beep codes for BIOSs from Phoenix, AMI, Award, Microid Research, and IBM.

Problems During the POST

Problems that occur during the POST are usually caused by incorrect hardware configuration or installation. Actual hardware failure is a far less-frequent cause. If you have a POST error, check the following:

  1. Are all cables correctly connected and secured?

  2. Are the configuration settings correct in Setup for the devices you have installed? In particular, ensure the processor, memory, and hard drive settings are correct.

  3. Are all drivers properly installed?

  4. Are switches and jumpers on the baseboard correct, if changed from the default settings?

  5. Are all resource settings on add-in boards and peripheral devices set so that no conflicts exist—for example, two add-in boards sharing the same interrupt?

  6. Is the power supply set to the proper input voltage (110V–120V or 220V–240V)?

  7. Are adapter boards and disk drives installed correctly?

  8. Is a keyboard attached?

  9. Is a bootable hard disk (properly partitioned and formatted) installed?

  10. Does the BIOS support the drive you have installed, and if so, are the parameters entered correctly?

  11. Is a bootable floppy disk installed in drive A:?

  12. Are all memory SIMMs or DIMMs installed correctly? Try reseating them.

  13. Is the operating system properly installed?

Hardware Problems After Booting

If problems occur after the system has been running, and without having made any hardware or software changes, a hardware fault possibly has occurred. Here is a list of items to check in that case:

  1. Try reinstalling the software that has crashed or refuses to run.

  2. Try clearing CMOS RAM and running Setup.

  3. Check for loose cables, a marginal power supply, or other random component failures.

  4. A transient voltage spike, power outage, or brownout might have occurred. Symptoms of voltage spikes include a flickering video display, unexpected system reboots, and the system not responding to user commands. Reload the software and try again.

  5. Try reseating the memory modules (SIMMs, DIMMs, or RIMMs).

Problems Running Software

Problems running application software (especially new software) are usually caused by or related to the software itself, or are due to the fact that the software is incompatible with the system. Here is a list of items to check in that case:

  1. Does the system meet the minimum hardware requirements for the software? Check the software documentation to be sure.

  2. Check to see that the software is correctly installed. Reinstall if necessary.

  3. Check to see that the latest drivers are installed.

  4. Scan the system for viruses using the latest antivirus software.

Problems with Adapter Cards

Problems related to add-in boards are usually related to improper board installation or resource (interrupt, DMA, or I/O address) conflicts. Chapter 4, "Motherboards and Buses," has a detailed discussion of these system resources, what they are, how to configure them, and how to troubleshoot them. Also be sure to check drivers for the latest versions and ensure that the card is compatible with your system and the operating system version you are using.

Sometimes adapter cards can be picky about which slot they are running in. Despite the fact that, technically, a PCI or ISA adapter should be able to run in any of the slots, minor timing or signal variations sometimes occur from slot to slot. I have found on numerous occasions that simply moving a card from one slot to another can make a failing card begin to work properly. Sometimes moving a card works just by the inadvertent cleaning (wiping) of the contacts that takes place when removing and reinstalling the card, but in other cases I can duplicate the problem by inserting the card back into its original slot. When all else fails, try moving the cards around! Because some motherboards share a single IRQ between two PCI slots or between a PCI and an AGP slot, changing one of the PCI cards to another slot can resolve conflicts.

Caution

Note that PCI cards become slot specific after their drivers are installed. By this I mean that if you move the card to another slot, the plug-and-play resource manager sees it as if you have removed one card and installed a new one. You therefore must install the drivers all over again for that card. Don't move a PCI card to a different slot unless you are prepared with all the drivers at hand to perform the driver installation. ISA cards don't share this quirk because the system is not aware of which slot an ISA card is in.

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    About the Author
    Acknowledgments
    Introduction
    Chapter 1. Development of the PC
    Chapter 2. PC Components, Features, and System Design
    Chapter 3. Microprocessor Types and Specifications
    Chapter 4. Motherboards and Buses
    Chapter 5. BIOS
    Chapter 6. Memory
    Chapter 7. The ATA/IDE Interface
    Chapter 8. The SCSI Interface
    Chapter 9. Magnetic Storage Principles
    Chapter 10. Hard Disk Storage
    Chapter 11. Floppy Disk Storage
    Chapter 12. High-Capacity Removable Storage
    Chapter 13. Optical Storage
    Chapter 14. Physical Drive Installation and Configuration
    Chapter 15. Video Hardware
    Chapter 16. Audio Hardware
    Chapter 17. I/O Interfaces from Serial and Parallel to IEEE-1394 and USB
    Chapter 18. Input Devices
    Chapter 19. Internet Connectivity
    Chapter 20. Local Area Networking
    Chapter 21. Power Supply and Chassis/Case
    Chapter 22. Building or Upgrading Systems
    Chapter 23. PC Diagnostics, Testing, and Maintenance
    PC Diagnostics
    The Hardware Boot Process
    PC Maintenance Tools
    Preventive Maintenance
    Troubleshooting Tips and Techniques
    Top Troubleshooting Problems
    Chapter 24. File Systems and Data Recovery
    Appendix A. Glossary
    Appendix B. Key Vendor Contact Information
    Appendix C. Troubleshooting Index
    List of Acronyms and Abbreviations
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele