Re: New libc malloc patch

From: Jon Dama <jd_at_ugcs.caltech.edu>
Date: Tue, 29 Nov 2005 14:21:07 -0800 (PST)
Jason,

Actually I didn't mean to imply it was hopeless at all.  :-)  Obviously
there are solutions to the address space issues and agree it is possible
to change your code to improve the situation quite a bit; though I think
the best one might be to permit mmap to actually consume space below
maxdsiz either when hinted to do so or when the space above is consumed,
but before I recommend that too much, I must say that I haven't looked
into what that would entail at all.

Let me take a closer look at what you are doing with regards to
cache-lines.  You seem to be implying that you are only taking care in
regards to how you malloc within a given page?

I have a suspicion that it might just be better to dump the problem on to
the application in the sense that no malloc should ever be less
than the size of one cache line.  Perhaps this is what you are doing?

-Jon


On Tue, 29 Nov 2005, Jason Evans wrote:

> Jon,
>
> Thanks for your comments.  Fortunately, I don't think things are
> quite as hopeless as you think, though you may be right that some
> adjustments are necessary.  Specific replies follow.
>
> On Nov 29, 2005, at 12:06 PM, Jon Dama wrote:
> > There exists a problem right now--localized to i386 and any other arch
> > based on 32-bit pointers: address space is simply too scarce.
> >
> > Your decision to switch to using mmap as the exclusive source of
> > malloc
> > buckets is admirable for its modernity but it simply cannot stand
> > unless
> > someone steps up to change the way mmap and brk interact within the
> > kernel.
> >
> > The trouble arises from the need to set MAXDSIZ and the resulting
> > effect
> > it has in determining the start of the mmap region--which I might
> > add is
> > the location that the shared library loader is placed.  This
> > effectively
> > (and explicitly) sets the limit for how large of a contiguous
> > region can
> > be allocated with brk.
> >
> > What you've done by switching the system malloc to exclusively using
> > mmap is induced a lot of motivation on the part of the sysadmin to
> > push
> > that brk/mmap boundary down.
> >
> > This wouldn't be a problem except that you've effectively shot in
> > the foot
> > dozens of alternative c malloc implementations, not to mention the
> > memory
> > allocator routines used in obscure languages such as Modula-3 and
> > Haskell
> > that rely on brk derived buckets.
> >
> > This isn't playing very nicely!
>
> Where should MAXDSIZ be?  Given scarce address space, the best we can
> hope for is setting it to the "least bad" default, as measured by
> what programs we care about do.  No matter what we do, some programs
> lose.
>
> That said, it turns out that adding the ability to allocate via brk
> isn't hard.  The code already contains the logic to recycle address
> ranges, in order to reduce mmap system call overhead.  All of the
> locking infrastructure is also already in place.  The only necessary
> modifications are 1) explicit use of brk until all data segment space
> is consumed, 2) special case code for addresses in the brk range, so
> that madvise() is used instead of munmap(), and 3) preferential re-
> use of the brk address space over mmap'ed memory.
>
> Do you agree that there is no need for using brk on 64-bit systems?
>
> > I looked into the issues and limitations with phkmalloc several
> > months ago
> > and concluded that simply adopting ptmalloc2 (the linux malloc) was
> > the
> > better approach--notably it is willing to draw from both brk and
> > mmap, and
> > it also implements per-thread arenas.
>
> ptmalloc takes a different approach to per-thread arenas that has
> been shown in multiple papers to not scale as well as the approach I
> took.  The difference isn't significant until you get to 8+ CPUs, but
> we already have systems running with enough CPUs that this is an issue.
>
> > There is also cause for concern about your "cache-line" business.
> > Simply
> > on the face of it there is the problem that the scheduler does not
> > do a
> > good job of pinning threads to individual CPUs.  The threads are
> > already
> > bounding from cpu to cpu and thrashing (really thrashing) each CPU
> > cache
> > along the way.
> >
> > Second, you've forgotten that there is a layer of indirection
> > between your
> > address space and the cache: the mapping of logical pages (what you
> > can
> > see in userspace) to physical pages (the addresses of which actually
> > matter for the purposes of the cache).  I don't recall off-hand
> > whether or
> > not the L1 cache on i386 is based on tags of the virtual addresses,
> > but I
> > am certain that the L2 and L3 caches tag the physical addresses not
> > the
> > virtual addresses.
> >
> > This means that your careful address selection based on cache-lines
> > will
> > only work out if it is done in the vm codepath: remember the
> > mapping of
> > physical addresses to the virtual addresses that come back from
> > mmap can
> > be delayed arbitrarily long into the future depending on when the
> > program
> > actually goes to touch that memory.
> >
> > Furthermore, the answer may vary depending on the architecture or
> > even the
> > processor version.
>
> I don't think you understand the intent for the "cache-line
> business".  There is only one intention: avoid storing data
> structures in the same cache line if they are likely to be accessed
> simultaneously by multiple threads.  For example, if two independent
> structures are stored right next to each other, although they do not
> require synchronization protection from each other, the hardware will
> send a cache line invalidation message to other CPUs every time
> anything in the cache line is modified.  This means horrible cache
> performance if the data are modified often.
>
> As a particular example, there are per-arena data structures that are
> modified quite often (arena_t).  If two arenas were right next to
> each other, then they could share a cache line, and performance would
> potentially be severly impacted.
>
> |---arena_t---|---arena_t---|
> |   |   |   |   |   |   |   |
>               ^^^
>               BAD!
>
> Thanks,
> Jason
>
Received on Tue Nov 29 2005 - 21:23:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC