Am Tue, 03 Nov 2009 10:42:40 +0000 schrieb Gavin Atkinson <gavin_at_FreeBSD.org>: > On Sat, 2009-10-31 at 23:15 +0100, Kai Gallasch wrote: > > Hi. > > > > I installed 8.0RC2-amd64 on an 8-core opteron server a few days ago. > > > > When I try to do a make buildworld or make buildkernel the server > > reboots without any message left in the logs. The same happens > > when building bigger ports (for example ruby18 or perl58) > First place I think I'd start id by running memtest86 on the machine > overnight. This sounds like possible hardware issue to me, it would > be good to see if we can confirm that that is the case. I will do so tomorrow. Following actions I have already taken to rule out a hardware problem: - ran several passes with diagnostic software from the manufacturer - reset BIOS settings to default - upgraded BIOS to newest release - booted server from 2 year old backup BIOS - took out the only pair of RAM modules that was different from the rest of the modules - installed freebsd 7.2-STABLE on the server to repeat the kernel panic (no panic with 7.2) - installed 8.0-BETA4 (crash) Besides: The server was in production with 7.2 for some time, without showing any such problems. > > Now my idea was to install the old 8.0-BETA4 and upgrade to RC2 > > through makeworld + buildkernel (gdb+witness). But no luck. When > > trying to upgrade to RC2 the 8.0-BETA4 also crashes. At least > > 8.0-BETA4 has debug > > + witness active in the GENERIC kernel.. > > > > So below some debug output of 8.0-BETA4 crashing. Has a vfs/ffs LOR > > problem with the BETA4 already been fixed? > > The debug output you included were just lock order reversals, and > don't seem to be related to your crash. Sorry for causing possible confusion about this. I realized this after my mail was already out. > I think 8.0-BETA4 still had the debugger compiled in (you can test by > pressing ctrl-alt-escape ion the console, if you do drop to the > debugger, give the "c" command to continue). > > If the debugger is compiled in, then the spontaneous reboot without > dropping to the debugger suggests even more that it may be hardware > related. If you do get to the debugger, a copy of all of the messages > on screen and the output of the "bt" command would be very useful. > When you do your kernel recompile, please include full debugging, > including WITNESS, INVARIANTS, KDB, DDB etc. In the meantime I managed it to install a RELENG_8 world + GENERIC kernel with all debug options enabled on the crashing server. (mounted /usr/src and /usr/obj on another server running 8.0RC1 through NFS and did buildworld + buildkernel over there..) So now I have a debug kernel available with dumpev + dumpdir defined. Here are my latest findings on this issue: - Running a makeworld in about 80% leads to a server crash without the server writing a crashdump to dumpdir. The server just reboots.. - In about 20% of the cases makeworld gets stuck in a not terminating process that eats up 100% cpu. This process cannot be killed. When restarting makeworld the server then reboots again - It makes no difference doing makeworld -j1 or -j8, result is the same > It depends what the bug is to be honest. So far there isn't really > enough information to determine the cause, and therefore there isn't > really enough info for a PR. Mark Atkinson also commented on my mail and he gave the hint: "If vm.pmap.pg_ps_enabled is 1 in 8.0-rc2, you might try rebooting with c in /boot/loader.conf and try another buildworld." So I thought why not and just tried it - and surprise: Disabling vm.pmap.pg_ps_enabled=1 in loader.conf resolves my problem with 8.0RC2 crashing when doing a makeworld.. After successful buildworld and buildkernel I rebooted the server again with commented out vm.pmap.pg_ps_enabled=0 and the problem was there again. And then I disabled the option again in loader.conf, rebooted + make buildworld .. no problem. Seems to be deterministic. With vm.pmap.pg_ps_enabled=1 the server crashes without being able to write crashdumps to dumpdev. (at least on this specific Proliant DL385G2 server) --Kai. -- You need more time; and you probably always will.Received on Tue Nov 03 2009 - 23:17:36 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:57 UTC