On Dec 23, 2005, at 2:28 AM, David Xu wrote: > I know what '>' does in phkmalloc. I found 'Q', he replaced '>' with > 'Q', this is really strange to me. ;-) Actually, the closest analog to phkmalloc's '>' and '<' are 'C' and 'c'. However, they don't have quite the same meaning, so I thought that changing the designators was appropriate. Here's a snippet from the man page about some of the performance tuning flags supported by jemalloc: C Increase/decrease the size of the cache by a factor of two. The default cache size is 256 objects for each arena. This option can be specified multiple times. N Increase/decrease the number of arenas by a factor of two. The default number of arenas is twice the number of CPUs, or one if there is a single CPU. This option can be specified multiple times. Q Increase/decrease the size of the allocation quantum by a factor of two. The default quantum is the minimum allowed by the archi- tecture (typically 8 or 16 bytes). This option can be specified multiple times. The implications of each of these flags is described in some detail later in the man page: This allocator uses multiple arenas in order to reduce lock contention for threaded programs on multi-processor systems. This works well with regard to threading scalability, but incurs some costs. There is a small fixed per-arena overhead, and additionally, arenas manage memory com- pletely independently of each other, which means a small fixed increase in overall memory fragmentation. These overheads aren't generally an issue, given the number of arenas normally used. Note that using sub- stantially more arenas than the default is not likely to improve perfor- mance, mainly due to reduced cache performance. However, it may make sense to reduce the number of arenas if an application does not make much use of the allocation functions. This allocator uses a novel approach to object caching. For objects below a size threshold (use the ``P'' option to discover the threshold), full deallocation and attempted coalescence with adjacent memory regions are delayed. This is so that if the application requests an allocation of that size soon thereafter, the request can be met much more quickly. Most applications heavily use a small number of object sizes, so this caching has the potential to have a large positive performance impact. However, the effectiveness of the cache depends on the cache being large enough to absorb typical fluctuations in the number of allocated objects. If an application routinely fluctuates by thousands of objects, then it may make sense to increase the size of the cache. Conversely, if an application's memory usage fluctuates very little, it may make sense to reduce the size of the cache, so that unused regions can be coalesced sooner. This allocator is very aggressive about tightly packing objects in mem- ory, even for objects much larger than the system page size. For pro- grams that allocate objects larger than half the system page size, this has the potential to reduce memory footprint in comparison to other allo- cators. However, it has some side effects that are important to keep in mind. First, even multi-page objects can start at non-page-aligned addresses, since the implementation only guarantees quantum alignment. Second, this tight packing of objects can cause objects to share L1 cache lines, which can be a performance issue for multi-threaded applications. There are two ways to approach these issues. First, posix_memalign() provides the ability to align allocations as needed. By aligning an allocation to at least the L1 cache line size, and padding the allocation request by one L1 cache line unit, the programmer can rest assured that no cache line sharing will occur for the object. Second, the ``Q'' option can be used to force all allocations to be aligned with the L1 cache lines. This approach should be used with care though, because although easy to implement, it means that all allocations must be at least as large as the quantum, which can cause severe internal fragmentation if the application allocates many small objects. JasonReceived on Fri Dec 23 2005 - 19:07:39 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:49 UTC