On Thu, Dec 15, 2016 at 03:56:56PM +0200, Konstantin Belousov wrote: > > > Possibly, the dmesg of the boot (with late_console=0) with this and only > > > this patch applied against stock HEAD. This might be long. > > > > Do you need all (262144?) lines? > > > > Testing system > > memory........................................................................................................................pb 0x2040000000 > > pb 0x2040001000 > > pb 0x2040002000 > > pb 0x2040003000 > > pb 0x2040004000 > > pb 0x2040005000 > > pb 0x2040006000 > > [...] > > pb 0x207ffff000 > > > > > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c > > > index 682307f5fe4..072c8d76acf 100644 > > > --- a/sys/amd64/amd64/machdep.c > > > +++ b/sys/amd64/amd64/machdep.c > > > _at__at_ -1400,6 +1400,7 _at__at_ getmemsize(caddr_t kmdp, u_int64_t first) > > > */ > > > *(int *)ptr = tmp; > > > > > > +if (page_bad) printf("pb 0x%lx\n", pa); > > > skip_memtest: > > > /* > > > * Adjust array of valid/good pages. > > > > PS: memtest86 hung at test 128-130G (server have 128G installed). > Well, the physical memory is 128G, but it is not mapped contiguously into > the address space accessible to the processors. E.g. in the SMAPs you > posted above, there are several holes (type 2) used for PCIe config > window, PCI BARs, APICs, and other i/o register pages. Intel chipsets > allow to remap the RAM hidden by the io pages, which is probably not > done correctly by BIOS. > > The SMAP clearly reports segment 0x100000000-0x2080000000 as populated > by RAM, this is 4G-130G. Very primitive memory test in kernel does > not like all pages starting at 129G. Possibly important detail is that > kernel memory test only touches first 4 bytes on each page. So if BIOS > erronously mapped any io registers into that range, memory test might > luckily avoid touching anything critical, but still noting that the > page does not behave as RAM. > > Update BIOS, and if the issue persists, contact supermicro. This > interesting detail adds even more evidence that BIOS is problematic. Updated BIOS don't solve this.Received on Thu Dec 15 2016 - 21:45:10 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC