Next Previous Contents

8. Possible causes of the lockups

Many possible causes for the lockups have been postulated:

8.1 Cooling of the BX Chipset

This was probably the first theory presented. On the BP6, the BX chipset, which provides much of the core functionality of the computer, has a single heatsink with no thermal grease and without a fan on it. In SMP mode, it is known to get quite hot. However, while it may be the root of some problems, many people have taken steps to ensure it is adequately cooled, with no effect. In any case, cooling the chipset can't hurt and certainly helps to rule out the chipset as a problem. The typical suggestion is to get some thermal grease and apply it to the space between the heat sink and the chipset (the relevant one is the only green heat sink on the BP6). Radio shack will sell you some for about $3.50; they call it "heat sink compound." Another popular suggestion is to steal the fan off of a 486 heat sink and screw it onto the green heatsink for the BX. This is a fairly easy procedure. The FAQ author has used a 486 heatsink again from Radio Shack, but almost any one which is constructed of a fan screwed onto a standard heat sink will likely fit. For a detailed view of how to perform this procedure, complete with graphics, take a look at http://www.bp6.com/bx/.

8.2 Case heat

Like the BX chipset, this is probably a problem for some people, but also doesn't seem to be the final answer. Any SMP motherboard generates a fair bit of heat, and this one is no exception. You should make sure your case moves an adequate amount of air, but there are plenty of people with very good cooling for their BP6 who still have problems.

8.3 Overclocking

Like the previous problems, this may be a confounding factor, but there are enough people having problems with non-overclocked CPUs that this has been ruled out as the cause.

Rogier Wolff adds: Any problems reported with "overclocked" CPUs should be ignored: You're only allowed to overclock your CPUs if it doesn't lead to any problems. If you're getting hangs, overclocking is the first possible cause, so if you want to complain about the hangs, please run the CPUs at their rated voltage and clockspeed.

8.4 The Power Supply

Some people have reported having better success when they moved from a smaller to a larger power supply (typically 300 watts), perhaps pointing to a deficiency in the voltage regulators on the BP6. This is definitely a possible factor; many PC power supplies are notoriously poor at delivering proper voltage. Unfortunately, if this is the problem, it doesn't appear to be one easily fixable by the user as many people who are using high quality 300 watt power supplies are also reporting problems.

8.5 The Highpoint ATA66 Controller

The Highpoint IDE controller is known to have some problems, and the Linux driver isn't quite stable, however there are a number of people experiencing lockups who are running 100% SCSI systems, or IDE without the Highpoint. [ed: Anyone want to write a more detailed description of issues with the Highpoint?]

8.6 Linux

Since many of the people complaining of lockups are running Linux, some people have suggested it could be a kernel bug. If we take a step back however, and look at the number of people running SMP Linux on other motherboards, the concensus is that it's unlikely that Linux is causing a problem. Also, it is questionable if the "others" running other OSes are putting the right stress on their system to trigger the bug; Linux users tend to be more adept at both stressing and debugging their machine. In addition, Linux' general stability means people running it can come to more intelligent conclusions when they do get crashes (ie, if a Microsoft product crashes, are you really surprised?).

People running other operating systems (NT, FreeBSD etc) have reported having problems, but it seems without the frequency of Linux users, possibly for the reasons mentioned above. It's also possible Linux triggers crashes more easily due to its architecture. Among other things, it's know to have about the fastest VM system on the x86 platform at the moment. Peter Bell reports that BeOS hangs in a similar manner to Linux.

8.7 Linux 2.2 vs Linux 2.3

Many people have noticed that upon upgrading to Linux 2.3, they begin to get spurious APIC errors, such as the following: Jan 29 19:46:00 KeyserSoze kernel: APIC error interrupt on CPU#1, should never happen. Jan 29 19:46:00 KeyserSoze kernel: ... APIC ESR0: 00000002 Jan 29 19:46:00 KeyserSoze kernel: ... APIC ESR1: 00000002 Jan 29 19:46:00 KeyserSoze kernel: ... bit 1: APIC Receive CS Error (hw problem). The frequency of them often depend on system load, speed, etc. While this may seem to point the finger at Linux, the fact is that other operating systems simply silently drop the errors; you're still getting the errors under Linux 2.2, FreeBSD or WinNT; they're just not getting reported.

There is some anecdotal evidence that Linux 2.3 will trigger the crashes earlier; this may be due to the increasingly fast and streamlined Linux kernel; chances are 2.3 pushes the hardware even harder than 2.2.

8.8 lm sensors

lm sensors is the package for Linux that allows one to monitor various things on the motherboard, such as CPU temperature, line voltages and fan speed. At least one person has reported that removing lm sensors may have improved his stability, however other people have reported that they only started using lm sensors after they experienced hangs, so it seems like this is probably just one more confounding factor.

8.9 X-Free86, AGP, PCI or DMA

Many of the "no-crash" stories have come from people running only in text mode, and some people have confirmed that their machine is unstable while in X, but stable at the command lines. At least one person has reported success after switching to a non-XFree86 X server. Something related to either graphics cards or their resources seems like the highest probablity at the moment for the root of the problems. Matrox cards in particular have been suspected as a culprit but it's difficult to point the finger with any assurance since the majority of people running the BP6 in Linux seem to have Matrox cards. If your machine crashes and you can spare X for a few days, it's a very useful datapoint if you can run on console for a while and report whether your stability improves.

8.10 BIOS This is possibly the answer, at least for some people. The author's system, when running non-overclocked with the QQ BIOS, produces nearly no APIC errors and has run for 5 or more days solidly.


Next Previous Contents