Re: Enabling NUMA in BIOS stop booting FreeBSD

From: Slawa Olhovchenkov <slw_at_zxy.spb.ru>
Date: Fri, 16 Dec 2016 01:45:00 +0300
On Thu, Dec 15, 2016 at 03:56:56PM +0200, Konstantin Belousov wrote:

> > > Possibly, the dmesg of the boot (with late_console=0) with this and only
> > > this patch applied against stock HEAD.  This might be long.
> > 
> > Do you need all (262144?) lines?
> > 
> > Testing system
> > memory........................................................................................................................pb 0x2040000000
> > pb 0x2040001000
> > pb 0x2040002000
> > pb 0x2040003000
> > pb 0x2040004000
> > pb 0x2040005000
> > pb 0x2040006000
> > [...]
> > pb 0x207ffff000
> > 
> > > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> > > index 682307f5fe4..072c8d76acf 100644
> > > --- a/sys/amd64/amd64/machdep.c
> > > +++ b/sys/amd64/amd64/machdep.c
> > > _at__at_ -1400,6 +1400,7 _at__at_ getmemsize(caddr_t kmdp, u_int64_t first)
> > >  			 */
> > >  			*(int *)ptr = tmp;
> > >  
> > > +if (page_bad) printf("pb 0x%lx\n", pa);
> > >  skip_memtest:
> > >  			/*
> > >  			 * Adjust array of valid/good pages.
> > 
> > PS: memtest86 hung at test 128-130G (server have 128G installed).
> Well, the physical memory is 128G, but it is not mapped contiguously into
> the address space accessible to the processors.  E.g. in the SMAPs you
> posted above, there are several holes (type 2) used for PCIe config
> window, PCI BARs, APICs, and other i/o register pages.  Intel chipsets
> allow to remap the RAM hidden by the io pages, which is probably not
> done correctly by BIOS.
> 
> The SMAP clearly reports segment 0x100000000-0x2080000000 as populated
> by RAM, this is 4G-130G.  Very primitive memory test in kernel does
> not like all pages starting at 129G.  Possibly important detail is that
> kernel memory test only touches first 4 bytes on each page.  So if BIOS
> erronously mapped any io registers into that range, memory test might
> luckily avoid touching anything critical, but still noting that the
> page does not behave as RAM.
> 
> Update BIOS, and if the issue persists, contact supermicro. This
> interesting detail adds even more evidence that BIOS is problematic.

Updated BIOS don't solve this.
Received on Thu Dec 15 2016 - 21:45:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC