Kai Gallasch wrote: > Am Tue, 10 Nov 2009 19:05:23 +0200 > schrieb Andriy Gapon <avg_at_icyb.net.ua>: > >> on 10/11/2009 17:22 gary.jennejohn_at_freenet.de said the following: >>> Well, OK, I may have misinterpreted what you wrote or have chosen >>> bad wording myself to convey the same message. Nonetheless it >>> looks like a hardware problem to me. >> [Trying to make up for my previous mistake.] >> >> The symptom certainly looks like misbehaving hardware, but other >> information from the reports seems to suggest that it is possible >> that this misbehavior might be caused by software misconfiguring the >> hardware. > > Hi. > > This thread was started by me. In the meantime I filed a PR: > http://www.freebsd.org/cgi/query-pr.cgi?pr=140338 > >> I would re-test vm.pmap.pg_ps_enabled=0 just to be sure that it was >> correctly teh first time. > > I toggled vm.pmap.pg_ps_enabled three times between reboots and the > result is always the same. superpages enabled: reboot, superpages not > enabled: server stable > >> I would try to see how 8.0-RC1 kernel behaves and in general try to >> find last working, first non-working version. > 8.0RC1, 8.0BETA4 already showed the same behaviour > >> It would be useful to know any (if any) non-default loader.conf and >> rc.conf settings or kernel config (if not GENERIC). > > loader.conf untouched, rc.conf had just settings for networking active > when testing. In the end I enabled some other stuff to have it ready for > 8.0 RELEASE, *after* I found out that disabling superpages helped > against the crashes. > > Ah yes. I also ran memtest86 on the server for about half a day - no > problems. > > But read for yourself in the PR. > > I don't rule out that this behaviour with vm.pmap.pg_ps_enabled maybe > hardware related, but why then is the server running stable > with RELENG_7 and memtest and server diagnostics don't report any > problem? See the following, where I noticed this problem first a long time ago on my HPDL385g5. It also passed memtest86 for days and I was able to swap out memory modules to the same result. http://article.gmane.org/gmane.os.freebsd.current/111307 I suspect this is actually a machine check exception you're seeing, which you'll notice if you enable hw.mca.enabled="1", and superpages, then do buildworld. Using -j doesn't matter, it's just takes longer to throw an exception. I'm hoping this is the rev E lfence problem, even though my chips are not targetted. When and if a patch goes into -current, I'll try it out to see if the problem with superpages goes away. -MarkReceived on Tue Nov 10 2009 - 18:29:30 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:57 UTC