Re: Weird performance behaviour in 7.0

From: Richard Todd <rmtodd_at_ichotolot.servalan.com> Date: Sat, 26 Jan 2008 13:58:06 -0600 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:26 UTC

bruce_at_zuhause.mn.org writes:
> Richard Todd writes:
>  > This wouldn't by any chance be an Intel 965-chipset-based motherboard
>  > with 4G or more of memory, would it?  Because there's an interesting
>  > little bug in the BIOS on some of those boards which causes the
>  > cache-control registers to incorrectly declare a chunk of main memory
>  > as uncacheable.  This results in random slowdowns depending on whether
>  > your process lands in the "bad" zone of memory or not.  See 
>  > http://article.gmane.org/gmane.os.freebsd.stable/50135/ for more details. 
>
> Bingo!  This is a Intel DG965WH with 4 GB of memory.  I don't think I
> can downgrade to the 1669 firmware because of the processor I'm
> using.  The Fedora thread says that there's a hack to do the following
> in linux to fix the "bad" zone

> echo "base=0x1a8000000 size=0x4000000 type=write-back" >| /proc/mtrr
Yeah, but the exact numbers depend on what exactly the bad zone is, and
that seems to vary from system to system (depending on what memory is 
installed and what rev of the BIOS.  Let's see if we can figure it out. 
It's gonna be one of the ranges in the list you posted below (the last few
ones are the important ones)

> 0x0/0xf080000000 BIOS write-back set-by-firmware active bogus
> 0x80000000/0xf040000000 BIOS write-back set-by-firmware active bogus
> 0xc0000000/0xf010000000 BIOS write-back set-by-firmware active bogus
> 0xcf800000/0xf000800000 BIOS uncacheable set-by-firmware active bogus
> 0xcf700000/0xf000100000 BIOS uncacheable set-by-firmware active bogus

These entries each specify a start address and a length of a range of 
memory.  Those ranges look a lot bigger than they really are thanks to that
leading "f" digit on all of them, which tells me that you're running in 64-bit
mode; apparently in 64-bit mode the Intel cache-control registers have fewer 
active bits than the AMD64 equivalent for which the existing amd64 machdep
code in the kernel was written for, so we get 4 gratuitous extra high bits. 
So, e.g., the first range is actually of length 0x80000000 (which is 2G)
starting at address zero; that one's "write-back" so cache is enabled on that 
range.  Ditto the next one which is length 0x40000000 (1G) at address 2G. 
I'm guessing the bad ones are the two "uncachable" entries at the end, so
you might try doing
memcontrol clear -b 0xcf800000 -l 0xf000800000
memcontrol clear -b 0xcf700000 -l 0xf000100000
and see if that fixes things, and doesn't break anything.  

I am a bit puzzled that the writeback ranges listed in the above don't seem
to actually add up to enough to cover 4G of memory, though.  This worries me,
in that you may need to add additional cache-control entries.   
Before trying the memcontrol stuff above, it might help if you could boot 
the system in verbose mode and record the SMAP entries that get printed 
by getmemsize() in machdep.c as the system boots.   If you have a serial
console, this is easy; if you don't it gets tricky, as those SMAP lines appear
before the console is fully initialized so that the SMAP lines don't show
up in dmesg later, so I had to resort to booting with -d and doing
cleverly-placed ddb breakpoints.