Re: 8.0RC2 amd64 - kernel panic running make buildworld

From: Mark Atkinson <atkin901_at_yahoo.com>
Date: Tue, 10 Nov 2009 11:29:00 -0800
Kai Gallasch wrote:
> Am Tue, 10 Nov 2009 19:05:23 +0200
> schrieb Andriy Gapon <avg_at_icyb.net.ua>:
> 
>> on 10/11/2009 17:22 gary.jennejohn_at_freenet.de said the following:
>>> Well, OK, I may have misinterpreted what you wrote or have chosen
>>> bad wording myself to convey the same message.  Nonetheless it
>>> looks like a hardware problem to me.
>> [Trying to make up for my previous mistake.]
>>
>> The symptom certainly looks like misbehaving hardware, but other
>> information from the reports seems to suggest that it is possible
>> that this misbehavior might be caused by software misconfiguring the
>> hardware.
> 
> Hi.
> 
> This thread was started by me. In the meantime I filed a PR:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=140338 
>  
>> I would re-test vm.pmap.pg_ps_enabled=0 just to be sure that it was
>> correctly teh first time.
> 
> I toggled vm.pmap.pg_ps_enabled three times between reboots and the
> result is always the same. superpages enabled: reboot, superpages not
> enabled: server stable
> 
>> I would try to see how 8.0-RC1 kernel behaves and in general try to
>> find last working, first non-working version.
> 8.0RC1, 8.0BETA4 already showed the same behaviour
> 
>> It would be useful to know any (if any) non-default loader.conf and
>> rc.conf settings or kernel config (if not GENERIC).
> 
> loader.conf untouched, rc.conf had just settings for networking active
> when testing. In the end I enabled some other stuff to have it ready for
> 8.0 RELEASE, *after* I found out that disabling superpages helped
> against the crashes.
> 
> Ah yes. I also ran memtest86 on the server for about half a day - no
> problems.
> 
> But read for yourself in the PR.
> 
> I don't rule out that this behaviour with vm.pmap.pg_ps_enabled maybe
> hardware related, but why then is the server running stable
> with RELENG_7 and memtest and server diagnostics don't report any
> problem? 

See the following, where I noticed this problem first a long time
ago on my HPDL385g5.  It also passed memtest86 for days and I was able
to swap out memory modules to the same result.

http://article.gmane.org/gmane.os.freebsd.current/111307

I suspect this is actually a machine check exception you're seeing,
which you'll notice if you enable

hw.mca.enabled="1", and superpages, then do buildworld. Using -j doesn't
matter, it's just takes longer to throw an exception.

I'm hoping this is the rev E lfence problem, even though my chips are
not targetted.   When and if a patch goes into -current, I'll try it out
to see if the problem with superpages goes away.

-Mark
Received on Tue Nov 10 2009 - 18:29:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:57 UTC