Linux hardware stability guide, Part 1
CPU and memory troubleshooting
Daniel Robbins (firstname.lastname@example.org), President/CEO, Gentoo
One of Linux's claims to fame is its legendary stability.
However, the most stable operating system in the world won't do
you any good if your hardware is defective or misconfigured. In
this article, Daniel Robbins shows you how to diagnose and fix
CPU flakiness, as well as how to test your RAM for defects. By
the end of this article, you'll have the skills to ensure that
your Linux system is as stable as it possibly can be.
Rescuing your CPU
If your CPU is experiencing random intermittent errors when
placed under heavy load, it's possible that your CPU isn't
defective at all -- maybe it simply isn't being cooled properly.
Here are some things that you can check:
* Is your CPU fan plugged in?
* Is it relatively dust-free?
* Does the fan actually spin (and spin at the proper speed)
when the power is on?
* Is the heat sink seated properly on the CPU?
* Is there thermal grease between the CPU and the heat sink?
* Does your case have adequate ventilation?
If everything seems fine, then you may want to rerun the kernel
compile tests with an open case. Let the kernel compile go for
about five minutes and then put your hand inside the running
machine and touch the outside metal casing of the power supply to
ground yourself. Then, carefully test the temperature of the heat
sink with the tip of your finger. If it's unusually hot, then
it's very possible that your heatsink/fan combo just isn't
adequate for your particular CPU. In that case, upgrade your
system's cooling hardware -- hopefully, your CPU hasn't sustained
any permanent damage and is still functional.