gnn_at_freebsd.org wrote: > One of the folks I'm working with found this. The following code, > which yes, is just an example, is 1/2 as fast on 7.0-RELEASE as on > 6.3. Where should I look to find out why? There is a definite performance problem an arena_run_alloc(), but I'm happy to report that it was fixed in -current a while back. I plan to MFC malloc to RELENG_7 within the next few weeks. In a nutshell, the arena_run_alloc() performance problem is due to using a linear search to find sufficiently large runs of mapped (but currently unused) pages. There are caching mechanisms that speed up the searches to some degree, but there are still some linear aspects to the algorithm, so as memory usage increases, the searches take progressively longer. In -current, this problem is solved by maintaining red-black trees, so that arena_run_alloc() does a O(lg n) tree search, rather than a O(n) iterative search. It's worth mentioning that the benchmark is of marginal use, due to a simple (but common) flaw. At a minimum, a malloc benchmark should touch all allocated memory at least once. Otherwise, the benchmark is IMO too far removed from reality to measure anything of value, since memory access patterns look nothing like those of an actual application that dynamically allocates memory. Both phkmalloc and jemalloc use data structures that are mostly disjunct from the allocations (no headers), so the benchmark never even faults most pages in. This is especially true for phkmalloc, so jemalloc is unjustly penalized. If we were to include, say, dlmalloc in this comparison, it would be even more heavily penalized due to touching the pages while modifying allocation headers. JasonReceived on Tue Mar 04 2008 - 18:34:09 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:28 UTC