Re: 8.0RC2 amd64 - kernel panic running make buildworld

From: John Baldwin <jhb_at_freebsd.org>
Date: Fri, 13 Nov 2009 09:49:08 -0500
On Thursday 12 November 2009 1:59:32 pm Kai Gallasch wrote:
> Am Wed, 11 Nov 2009 15:04:14 -0500
> schrieb John Baldwin <jhb_at_freebsd.org>:
> 
> > On Wednesday 11 November 2009 2:15:18 pm S.N.Grigoriev wrote:
> > > 
> > > 10.11.09, 09:15, "Mark Atkinson" <atkin901_at_yahoo.com>
> > > wrote:
> > > 
> > > > Andriy Gapon wrote:
> > > > > on 10/11/2009 17:22 gary.jennejohn_at_freenet.de said the
> > > > > following:
>  
> > > > > Not a trivial issue unless it is hardware indeed.
> > > > > 
> > > > Also, you can try adding:
> > > > hw.mca.enabled="1" in /boot/loader.conf, reboot,  and then see if
> > > > there is a machine check exception on the console during the
> > > > buildworld.
> > > 
> > > Mark,
> > > 
> > > I've added hw.mca.enabled="1" in /boot/loader.conf and got the
> > > following screen during the buildworld:
> > > 
> > > .....
> > 
>  -c /usr/src/gnu/usr.bin/binutils/as/../../../../contrib/binutils/gas/sb.c
> > > 
> > > MCA: CPU3 UNCOR PCC OVER DTLIB L1 error
> > > MCA: Address 0x8015fb000
> > 
> > You hardware is broken and it is telling you so.  You have had
> > multiple machine checks with the most severe one being an
> > uncorrectable error in your data TLB (i.e. in the CPU itself).
> 
> John,
> 
> I also set hw.mca.enabled="1" and vm.pmap.pg_ps_enabled="1"
> in /boot/loader.conf on my (under load) spontaneously rebooting
> opteron proliant server.
> 
> Server was upgraded to FREEBSD-8.0-PRERELEASE today.
> 
> This is what happened..
> 
> 
> ---- machine check trap, first run ----
> 
> sonnenkraft:/usr/obj # MCA: CPU 5 UNCOR PCC OVER DTLB L1 error
> MCA: Address 0x80e5c8000

Hmm, normally I would suspect the CPU, but avg_at_ has been looking at the fact 
that there may be some sort of interaction with the superpages code and the 
machine check registers on AMD CPUs (either a CPU bug, or perhaps a 
superpages bug).  I would wait to see if he finds something.  An isolated MCA 
would most likely indicate a hardware error, but the fact that several people 
are reporting this exact machine check but only when superpages is enabled 
indicates it might be something else.

-- 
John Baldwin
Received on Fri Nov 13 2009 - 13:58:00 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC