Re: expanding past 1 TB on amd64

From: Alan Cox <alan.l.cox_at_gmail.com>
Date: Tue, 16 Jul 2013 14:12:42 -0700
On Tue, Jul 16, 2013 at 7:08 AM, Kurt Lidl <lidl_at_pix.net> wrote:

> On Wed, Jun 19, 2013 at 1:32 AM, Chris Torek <chris.torek at gmail.com>
>> wrote:
>>
>>  In src/sys/amd64/include/vmparam.**h is this handy map:
>>>
>>>  * 0x0000000000000000 - 0x00007fffffffffff   user map
>>>  * 0x0000800000000000 - 0xffff7fffffffffff   does not exist (hole)
>>>  * 0xffff800000000000 - 0xffff804020100fff   recursive page table (512GB
>>> slot)
>>>  * 0xffff804020101000 - 0xfffffdffffffffff   unused
>>>  * 0xfffffe0000000000 - 0xfffffeffffffffff   1TB direct map
>>>  * 0xffffff0000000000 - 0xffffff7fffffffff   unused
>>>  * 0xffffff8000000000 - 0xffffffffffffffff   512GB kernel map
>>>
>>> showing that the system can deal with at most 1 TB of address space
>>> (because of the direct map), using at most half of that for kernel
>>> memory (less, really, due to the inevitable VM fragmentation).
>>>
>>> New boards are coming soonish that will have the ability to go
>>> past that (24 DIMMs of 64 GB each = 1.5 TB).  Or, if some crazy
>>> people :-) might want to use a most of a 768 GB board (24 DIMMs of
>>> 32 GB each, possible today although the price is kind of
>>> staggering) as wired-down kernel memory, the 512 GB VM area is
>>> already a problem.
>>>
>>> I have not wrapped my head around the amd64 pmap code but figured
>>> I'd ask: what might need to change to support larger spaces?
>>> Obviously NKPML4E in amd64/include/pmap.h, for the kernel start
>>> address; and NDMPML4E for the direct map.  It looks like this
>>> would adjust KERNBASE and the direct map appropriately.  But would
>>> that suffice, or have I missed something?
>>>
>>> For that matter, if these are changed to make space for future
>>> expansion, what would be a good expansion size?  Perhaps multiply
>>> the sizes by 16?  (If memory doubles roughly every 18 months,
>>> that should give room for at least 5 years.)
>>>
>>>
>>>  Chris, Neel,
>>
>> The actual data that I've seen shows that DIMMs are doubling in size at
>> about half that pace, about every three years.  For example, see
>> http://users.ece.cmu.edu/~**omutlu/pub/mutlu_memory-**
>> scaling_imw13_invited-talk.**pdfslide<http://users.ece.cmu.edu/~omutlu/pub/mutlu_memory-scaling_imw13_invited-talk.pdfslide>
>> #8.  So, I think that a factor of 16 is a lot more than we'll need in
>> the next five years.  I would suggest configuring the kernel virtual
>> address space for 4 TB.  Once you go beyond 512 GB, 4 TB is the net
>> "plateau" in terms of address translation cost.  At 4 TB all of the PML4
>> entries for the kernel virtual address space will reside in the same L2
>> cache line, so a page table walk on a TLB miss for an instruction fetch
>> will effectively prefetch the PML4 entry for the kernel heap and vice
>> versa.
>>
>
> The largest commodity motherboards that are shipping today support
> 24 DIMMs, at a max size of 32GB per DIMM.  That's 768GB, right now.
> (So FreeBSD is already "out of bits" in terms of supporting current
> shipping hardware.)



Actually, this scenario with 768 GB of RAM on amd64 as it is today is
analogous to the typical 32-bit i386 machine, where the amount of RAM has
long exceeded the default 1 GB size of the kernel virtual address space.
 In theory, we could currently handle up to 1 TB of RAM, but the kernel
virtual address space would only be 512 GB.


... The Haswell line of CPUs is widely reported to
> support DIMMs twice as large, and it's due in September.  That would
> make the systems of late 2013 hold up to 1536GB of memory.
>
> Using your figure of doubling in 3 years, we'll see 3072GB systems by
> ~2016.  And in ~2019, we'll see 6TB systems, and need to finally expand
> to using more than a single cache line to hold all the PML4 entries.
>
>
Yes, this is a reasonable prognostication.

Alan


> Of course, that's speculating furiously about two generations out, and
> assumes keeping the current memory architecture / board design
> constraints.
>
> -Kurt
>
>
> ______________________________**_________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-**current<http://lists.freebsd.org/mailman/listinfo/freebsd-current>
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_**
> freebsd.org <freebsd-current-unsubscribe_at_freebsd.org>"
>
Received on Tue Jul 16 2013 - 19:12:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:39 UTC