On Wed, Dec 14, 2016 at 01:52:11PM +0300, Slawa Olhovchenkov wrote: > Booting... > KDB: debugger backends: ddb > KDB: current backend: ddb > SMAP type=01 base=0000000000000000 len=0000000000099c00 > SMAP type=02 base=0000000000099c00 len=0000000000006400 > SMAP type=02 base=00000000000e0000 len=0000000000020000 > SMAP type=01 base=0000000000100000 len=000000007906b000 > SMAP type=02 base=000000007916b000 len=0000000000936000 > SMAP type=04 base=0000000079aa1000 len=0000000000509000 > SMAP type=02 base=0000000079faa000 len=0000000002056000 > SMAP type=01 base=0000000100000000 len=0000001f80000000 > SMAP type=02 base=000000007c000000 len=0000000014000000 > SMAP type=02 base=00000000fed1c000 len=0000000000029000 > SMAP type=02 base=00000000ff000000 len=0000000001000000 > TTT1 0xfffff8207ff00000 0xfffff8207fffffb8 100000 > . 0 > . 1000 > . 2000 > . 3000 > . 4000 > . 5000 > . 6000 > . 7000 > . 8000 > . 9000 > . a000 > . b000 > . c000 > . d000 > . e000 > . f000 > . 10000 > . 11000 > . 12000 > . 13000 > . 14000 > . 15000 > . 16000 > . 17000 > . 18000 > . 19000 > . 1a000 > . 1b000 > . 1c000 > . 1d000 > . 1e000 > . 1f000 > . 20000 > . 21000 > . 22000 > . 23000 > . 24000 > . 25000 > . 26000 > . 27000 > . 28000 > . 29000 > . 2a000 > . 2b000 In other words, it is almost certainly the hang and not a fault causing hang. This means that the machine is not compliant with the IA32 architecture, in particular, the region reported as normal memory by E820 BIOS service does not behave as normal memory. Since regardless of the option setting, the memory map is same, and bootstrap page table only depend on the memory map, we use the same page table when hanging and when operating correctly. We do not fault or hang when the option is turned off, which together with the improved early fault handling in the patch, makes it almost certain that the problem is in hardware configuration and not in our early setup. Of course, the most puzzling part is that memory test makes the hang go away, while repeating memory test operation only on the msgbuf region does not. msgbuf is special in that it is located at TOHM (top of high memory). It spans 128KB from below it to the last byte of the last physical segment. The only ideas I have right now is that there is either a bug in the Caching Agent/Home agent/IMC configuration in BIOS, in which case there is nothing OS can do to mitigate it. Or it might be that the memory map reported by CMS is wrong (you said that you use legacy boot, right ?). This is not too surprising if true, because non-EFI boot code path definitely get less and less testing. For the later case (potential bug in CMS), could you switch to EFI boot mode and see whether the issue magically healths itself ? You could boot from USB stick in EFI mode without reinstalling for test. Do you use latest BIOS for your motherboard ?Received on Wed Dec 14 2016 - 10:39:37 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC