Re: expanding past 1 TB on amd64

From: Alan Cox <alan.l.cox_at_gmail.com> Date: Mon, 15 Jul 2013 12:41:39 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:39 UTC

On Wed, Jun 19, 2013 at 1:32 AM, Chris Torek <chris.torek_at_gmail.com> wrote:

> In src/sys/amd64/include/vmparam.h is this handy map:
>
>  * 0x0000000000000000 - 0x00007fffffffffff   user map
>  * 0x0000800000000000 - 0xffff7fffffffffff   does not exist (hole)
>  * 0xffff800000000000 - 0xffff804020100fff   recursive page table (512GB
> slot)
>  * 0xffff804020101000 - 0xfffffdffffffffff   unused
>  * 0xfffffe0000000000 - 0xfffffeffffffffff   1TB direct map
>  * 0xffffff0000000000 - 0xffffff7fffffffff   unused
>  * 0xffffff8000000000 - 0xffffffffffffffff   512GB kernel map
>
> showing that the system can deal with at most 1 TB of address space
> (because of the direct map), using at most half of that for kernel
> memory (less, really, due to the inevitable VM fragmentation).
>
> New boards are coming soonish that will have the ability to go
> past that (24 DIMMs of 64 GB each = 1.5 TB).  Or, if some crazy
> people :-) might want to use a most of a 768 GB board (24 DIMMs of
> 32 GB each, possible today although the price is kind of
> staggering) as wired-down kernel memory, the 512 GB VM area is
> already a problem.
>
> I have not wrapped my head around the amd64 pmap code but figured
> I'd ask: what might need to change to support larger spaces?
> Obviously NKPML4E in amd64/include/pmap.h, for the kernel start
> address; and NDMPML4E for the direct map.  It looks like this
> would adjust KERNBASE and the direct map appropriately.  But would
> that suffice, or have I missed something?
>
> For that matter, if these are changed to make space for future
> expansion, what would be a good expansion size?  Perhaps multiply
> the sizes by 16?  (If memory doubles roughly every 18 months,
> that should give room for at least 5 years.)
>
>
Chris, Neel,

The actual data that I've seen shows that DIMMs are doubling in size at
about half that pace, about every three years.  For example, see
http://users.ece.cmu.edu/~omutlu/pub/mutlu_memory-scaling_imw13_invited-talk.pdfslide
#8.  So, I think that a factor of 16 is a lot more than we'll need in
the next five years.  I would suggest configuring the kernel virtual
address space for 4 TB.  Once you go beyond 512 GB, 4 TB is the net
"plateau" in terms of address translation cost.  At 4 TB all of the PML4
entries for the kernel virtual address space will reside in the same L2
cache line, so a page table walk on a TLB miss for an instruction fetch
will effectively prefetch the PML4 entry for the kernel heap and vice versa.

Also, I don't know if this is immediately relevant to the patch, but the
reason that the direct map is currently twice the size of the kernel
virtual address space is that the largest machine (in terms of physical
memory) that we were running on a couple of years ago had a sparse physical
address space.  Specifically, we needed to have a direct map spanning 1 TB
in order to support 256 GB of RAM on that machine.  This may, for example,
become an issue if you try to autosize the direct map based upon the amount
of DRAM.

Alan