Re: [amd64] Reproducible cold boot failure (reboot succeeds) in -CURRENT

From: John Baldwin <jhb_at_freebsd.org>
Date: Wed, 16 Nov 2011 11:16:24 -0500
On Sunday, November 13, 2011 12:56:12 pm Stefan Esser wrote:
> Am 11.11.2011 13:15, schrieb Attilio Rao:
> > Can you try rebuilding your kernel and modules from scratch and see if
> > it fixes your problem?
> 
> Sorry for the delay, but my system seems to need being turned off (S5)
> for many hours (whole night) to reproduce the problem ...
> 
> I had already rebuilt my kernel multiple times in the last weeks. But
> just to be sure, I removed the build directories for kernel and world
> and built a new kernel after building and installing world from scratch.
> The next reboot (with boot  blocks from the freshly built world) failed
> again ...
> 
> But the first lines of boot messages look strange:
> 
> ...
> WARNING: WITNESS option enabled, expect reduced performance.
> Table 'FACP' at 0xba918a58
> Table 'APIC' at 0xba918b50
> Table 'SSDT' at 0xba918be8
> Table 'MCFG' at 0xba918dc0
> Table 'HPET' at 0xba918e00
> ACPI: No SRAT table found
> Preloaded elf kernel "/boot/kernel/kernel" at 0xffffffff81109000
> Preloaded elf obj module "/boot/kernel/zfs.ko" at 0xffffffff81109370 <--
> kldload: unexpected relocation type 67108875
> kernel trap 12 with interrupts disabled
> 
> The irritating detail is the load address of "zfs.ko", which is just
> 0x370 bytes above the kernel load address ...

That isn't unusual.  Those are the addresses of the metadata provided by the 
loader, not the base address of the kernel or zfs.ko object themselves.  The 
unexpected relocation type is interesting however.  That value in hex is 
0x400000b.  0xb is the R_X86_64_32S relocation type which is normal for the 
kernel.  I think you just have a single-bit memory error due to a failing 
DIMM.

> A verbose boot scrolls these lines off the screen to fast (and is to
> long to be preserved in dmesg.boot from the start), so I do not have any
> idea whether other values are reported in case of a successful boot.
> 
> I had already assumed that memory was corrupted during early start-up,
> but now I think that gptzfsboot writes the zfs kernel module over the
> start of the loaded kernel. I'll try some more tests later today.

Nah, if zfs.ko were loaded over the beginning of the kernel you wouldn't even 
get to the point of the first kernel printf.

-- 
John Baldwin
Received on Wed Nov 16 2011 - 15:23:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC