Re: Enabling NUMA in BIOS stop booting FreeBSD

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Wed, 14 Dec 2016 13:39:27 +0200
On Wed, Dec 14, 2016 at 01:52:11PM +0300, Slawa Olhovchenkov wrote:
> Booting...
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> SMAP type=01 base=0000000000000000 len=0000000000099c00
> SMAP type=02 base=0000000000099c00 len=0000000000006400
> SMAP type=02 base=00000000000e0000 len=0000000000020000
> SMAP type=01 base=0000000000100000 len=000000007906b000
> SMAP type=02 base=000000007916b000 len=0000000000936000
> SMAP type=04 base=0000000079aa1000 len=0000000000509000
> SMAP type=02 base=0000000079faa000 len=0000000002056000
> SMAP type=01 base=0000000100000000 len=0000001f80000000
> SMAP type=02 base=000000007c000000 len=0000000014000000
> SMAP type=02 base=00000000fed1c000 len=0000000000029000
> SMAP type=02 base=00000000ff000000 len=0000000001000000
> TTT1 0xfffff8207ff00000 0xfffff8207fffffb8 100000
> . 0
> . 1000
> . 2000
> . 3000
> . 4000
> . 5000
> . 6000
> . 7000
> . 8000
> . 9000
> . a000
> . b000
> . c000
> . d000
> . e000
> . f000
> . 10000
> . 11000
> . 12000
> . 13000
> . 14000
> . 15000
> . 16000
> . 17000
> . 18000
> . 19000
> . 1a000
> . 1b000
> . 1c000
> . 1d000
> . 1e000
> . 1f000
> . 20000
> . 21000
> . 22000
> . 23000
> . 24000
> . 25000
> . 26000
> . 27000
> . 28000
> . 29000
> . 2a000
> . 2b000

In other words, it is almost certainly the hang and not a fault causing
hang. This means that the machine is not compliant with the IA32
architecture, in particular, the region reported as normal memory by
E820 BIOS service does not behave as normal memory.

Since regardless of the option setting, the memory map is same, and
bootstrap page table only depend on the memory map, we use the same page
table when hanging and when operating correctly. We do not fault or hang
when the option is turned off, which together with the improved early
fault handling in the patch, makes it almost certain that the problem is
in hardware configuration and not in our early setup.

Of course, the most puzzling part is that memory test makes the hang
go away, while repeating memory test operation only on the msgbuf region
does not. msgbuf is special in that it is located at TOHM (top of high
memory). It spans 128KB from below it to the last byte of the last
physical segment.

The only ideas I have right now is that there is either a bug in the
Caching Agent/Home agent/IMC configuration in BIOS, in which case there
is nothing OS can do to mitigate it.  Or it might be that the memory
map reported by CMS is wrong (you said that you use legacy boot, right
?).  This is not too surprising if true, because non-EFI boot code path
definitely get less and less testing.

For the later case (potential bug in CMS), could you switch to EFI boot
mode and see whether the issue magically healths itself ?  You could boot
from USB stick in EFI mode without reinstalling for test.

Do you use latest BIOS for your motherboard ?
Received on Wed Dec 14 2016 - 10:39:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC