Re: errors during buildworld

From: Peter Jeremy <PeterJeremy_at_optushome.com.au>
Date: Sun, 9 Oct 2005 12:07:25 +1000
On Sun, 2005-Oct-09 14:03:04 +1300, Mark Kirkwood wrote:
>- Tested cpus with cpuburn (2xburnP6 for 1 hour).
>- Tested memory with memtest-86 (about 6 hours).

memtest-86 and cpuburn can demonstrate that there is a fault but not
that there isn't.  Pattern-sensitive memory errors, in particular, are
very unlikely to be detected.  Also, the above tests are focussed on
specific subsystems and would not pick up a problem was was triggered
by interactions between different subsystems (eg there is no disk or
PCI I/O in the above tests).

>The system passes these tests easily, so I am finding it hard to see 
>hardware problems (Indeed the system is well ventilated and cooled

Cooling isn't the only hardware problem.  Marginal PSUs or marginal
electros on the motherboard are also quite common - especially if
the hardware is getting old.  Electrolytic capacitors have a finite
life and this is shortened by heat and high ripple currents - both
of which are common in computers.

>I removed /usr/obj/usr/src/* and tried buildworld again, and it went 
>through that time (am running the updated system now)...

That suggests a hardware problem to me.

>So it's a bit confusing, could we be seeing a real gcc bug?

gcc is deterministic so a real gcc bug is more likely to manifest as a
consistent failure at the same point.  A problem that moves around and
isn't always there is more indicative of a hardware issue.

-- 
Peter Jeremy
Received on Sun Oct 09 2005 - 00:07:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:44 UTC