[amd64] Reproducible cold boot failure (reboot succeeds) in -CURRENT

From: Stefan Esser <se_at_freebsd.org>
Date: Thu, 10 Nov 2011 09:16:30 +0100
For a few weeks I have been suffering from a problem that requires 
manual intervention to get my home workstation boot -CURRENT.

The kernel panics at varying places and with different panic messages, 
e.g. (hand transcribed since kernel dumps don't work at that stage):

privileged instruction fault while in kernel mode

kmem_alloc_nofault +0x37
kmem_init +0x9e
vm_kmem_init +0x39
mi_startup +0x77
btext +0x2c


On another cold boot attempt:

kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode

elf_relocinternal +0xa8
link_elf_reloc_local +0x2fe
link_elf_link_preload +0x69d
linker_preload +0x101
mi_startup +0x77
btext +0x2c


In all the cases observed, the system starts without any problems on 
second attempt (pressing RESET or the reboot command in the debugger).
The system is working reliably, once booted.

This started a few weeks back (after the switch-over to 10-CURRENT,
IIRC), and I did not bother to report it at the time, since I thought it 
was caused by a temporary instability in the code base.

The system is an i2600K on ASUS P8H67-M EVO with 8GB of RAM and an amd64 
kernel booting from ZFS (gptzfsboot). The kernel is a stripped down 
GENERIC plus IPFW and ath (but I doubt that the configuration is causing 
this, since the failure happens before any devices are probed and the 
identically configured kernels used to cold boot just fine for half a year).

Any hint how to further diagnose this case is welcome (but my spare
time is very limited and I cannot easily bisect to find a revision
that boots, for example).

I can produce further debug output on demand, but I do not have a serial 
or firewire console setup for debugging.

Is anybody else affected by this boot problem?

Regards, STefan
Received on Thu Nov 10 2011 - 07:30:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC