Chuck Swiger wrote: > Can jemalloc only create per-CPU arenas only for processes which are themselves > multithreaded, when it's running on a multi-CPU system? Would that help reduce > the amount of allocated but unreferenced memory that is involved for the common > case of /bin/sh and friends? jemalloc already lazily creates arenas (and the associated chunks), so support for multi-threaded programs costs single-threaded programs practially nothing. Here's why a typical small program has a ~6 MB VSIZE on i386. Chunks have to be aligned at addresses that are multiples of the chunk size. Since the heap doesn't start at a chunk-aligned address, the first chunk that can be allocated from the heap is well past the beginning of the heap. Additionally, we need at least *two* chunks -- one for internal malloc data structures and one for application allocations. At the cost of a bit of extra complexity, it is possible to start off with a runt chunk for the internal malloc data structures, since the chunk alignment requirements happen to be unimportant for internally used chunks. This would reduce VSIZE somewhat. It isn't clear to me that this optimization is worthwhile overall (though I do have a patch that implements it). This problem doesn't even exist for the 64-bit architectures, since there we use mmap() for all chunks. JasonReceived on Wed Mar 22 2006 - 17:14:28 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:53 UTC