Re: make buildworld: Signal 11; Illegal instruction

From: Bruce Cran <bruce_at_cran.org.uk>
Date: Fri, 1 Aug 2003 13:54:02 +0100
On Fri, Aug 01, 2003 at 02:41:16PM +0200, Karel J. Bosschaart wrote:
> On Fri, Aug 01, 2003 at 01:04:16PM +0200, Tobias Roth wrote:
> > On Thu, Jul 31, 2003 at 09:52:08PM +0100, Bruce Cran wrote:
> > > On Thu, Jul 31, 2003 at 03:03:01PM -0400, Chris Shenton wrote:
> > > > Chris Shenton <chris_at_shenton.org> writes:
> > > > 
> > > > >   *** Signal 11
> > > > >... 
> > > > >   Illegal instruction (core dumped)
> > > > >   *** Error code 132
> > > > 
> > > > Also seeing
> > > > 
> > > > *** Signal 4
> > > > 
> > > > if it matters.  This sounds way too flakey to be SW.
> > > 
> > > I'm seeing the same symptoms.   I got a signal 4 when running 'clean'
> > in the 
> > > pam authentication directory, and I've just had a signal 11 running 
> > > 'rm -f libradius.so'.  This is an install from a snapshot I built
> > today - 
> > > during the install I had panics in _mtx_init_ and a backtrace traced
> > through 
> > > vfs and ffs functions, and I only managed to install successfully when
> > I 
> > > had the CPU throttled to 30%.  This is the same computer which ran
> > memtest86
> > > for 8 hours without a single fault last night, so I doubt the
> > hardware's 
> > > faulty, at least not the memory or the CPU.
> > 
> > memtest86 does not always catch memory errors. sig11 and sig4 at varying
> > locations during buildworld are a sure indicator for a hardware problem.
> > most likely a memory or overheating issue, though other hardware related
> > causes are possible.
> > 
> > if you still are not convinced that this is a hardware issue, run build-
> > world on a -stable system.
> > 
> > more and more latest generation laptops from different manufacturers
> > show
> > these symptoms during hot days. my guess is that mobile pentium 4
> > systems
> > are just not as stable as they should. let's hope things get better with
> > the pentium m chips. are the manufaturers deploy better quality control
> > to
> > catch the numerous faulty systems.
> 
> My stock Dell Optiplex GX260, P4 based with 256 MB RAM, running -current,
> would spit signal 4,10 and 11 (and also 6, don't remember) all over the place
> during buildworld when not having these kernel options:
> 
> options         DISABLE_PSE
> options         DISABLE_PG_G
> 
> Search the -current archive, it's due to a processor bug but there is
> no detailed public information about it and hence no 'official' fix.
> 
> You might try and see if it helps for you. memtest86 and other hardware
> testers won't notice anything because it's in the CPU and officially
> unknown.
> 
> But yes, also keep in mind that there might be overheating issues if 
> the wheather is hot; yesterday my -stable machine at home rebooted during
> a port build: turned out to be a flatcable being too close to the CPU fan...
> 
> Karel.

Thanks, I'd come to the conclusion it must have been the P4 bug.   The system
gets hot, sometimes 65 deg C during builds, but it very rarely aborts on a 
signal 11.   I don't quite understand what happened yesterday to break it so
badly, maybe it was because I was newly installing a -CURRENT snapshot I'd
built with pentium2 optimisations, but I don't know.   

--
Bruce Cran
Received on Fri Aug 01 2003 - 03:54:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:17 UTC