Re: New libc malloc patch

From: Kris Kennaway <kris_at_obsecurity.org>
Date: Sun, 11 Dec 2005 20:29:07 -0500
On Mon, Dec 12, 2005 at 08:50:01AM +0800, David Xu wrote:
> Julian Elischer wrote:
> 
> >>
> >>No new problems in the malloc code have been found for some time  
> >>now.  It has been tested on i386, sparc64, arm, and amd64.  In my  
> >>opinion, the malloc patch is ready to be committed.  I am now 
> >>working  on the assumption that new problems are more likely 
> >>application bugs  than malloc bugs.  This seems like a good time to 
> >>start sharing the  debugging load with the community. =)
> >>
> >>So, how about it?  Can this patch go in now?
> >
> >
> >
> >I may have missed it but some benchmark numbers could be good.
> >
> >Is there no way to make it an option for a while?
> >that would get good testing AND a fallback for people.
> >
> I also would like to see any benchmark number, in fact, I had plan
> to import ptmalloc in the past, the malloc problem had been discussed
> several times in thread_at_ list.

Here is the result of a benchmark that does 1K malloc()/free() with
multiple threads on a 14-CPU sparc64 machine.  This is a poor test
because sparc64 doesn't have TLS support, which is needed for jemalloc
to perform well.  It still shows it kicking the pants off of phkmalloc
for both single-threaded and multi-threaded malloc.

phkmalloc:

# ./malloc-test 1024 1000000 1
Starting test with 1 thread...
 Thread 2114048 adjusted timing: 27.124817 seconds for 1000000 requests of 1024 bytes.

Starting test with 2 threads...
 Thread 2114560 adjusted timing: 67.535854 seconds for 1000000 requests of 1024 bytes.
 Thread 2114048 adjusted timing: 70.330298 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 3
Starting test with 3 threads...
 Thread 2114048 adjusted timing: 74.154855 seconds for 1000000 requests of 1024 bytes.
 Thread 2115072 adjusted timing: 74.356363 seconds for 1000000 requests of 1024 bytes.
 Thread 2114560 adjusted timing: 77.038550 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 4
Starting test with 4 threads...
 Thread 2115072 adjusted timing: 217.741657 seconds for 1000000 requests of 1024 bytes.
 Thread 2115584 adjusted timing: 228.434310 seconds for 1000000 requests of 1024 bytes.
 Thread 2114048 adjusted timing: 228.941544 seconds for 1000000 requests of 1024 bytes.
 Thread 2114560 adjusted timing: 229.286134 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 5
Starting test with 5 threads...
 Thread 2114048 adjusted timing: 770.255000 seconds for 1000000 requests of 1024 bytes.
 Thread 2115072 adjusted timing: 770.749431 seconds for 1000000 requests of 1024 bytes.
 Thread 2116096 adjusted timing: 771.307654 seconds for 1000000 requests of 1024 bytes.
 Thread 2114560 adjusted timing: 772.293253 seconds for 1000000 requests of 1024 bytes.
 Thread 2115584 adjusted timing: 772.550847 seconds for 1000000 requests of 1024 bytes.

jemalloc:

# ./malloc-test 1024 1000000 1
Starting test with 1 thread...
 Thread -1610612656 adjusted timing: 5.428918 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 2
Starting test with 2 threads...
 Thread -1610612656 adjusted timing: 4.840497 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 4.948382 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 3
Starting test with 3 threads...
 Thread -1610611696 adjusted timing: 25.065195 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612656 adjusted timing: 25.218103 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 25.286181 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 4
Starting test with 4 threads...
 Thread -1610612656 adjusted timing: 38.176479 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611216 adjusted timing: 38.221169 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611696 adjusted timing: 38.294425 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 38.320669 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 5
Starting test with 5 threads...
 Thread -1610611216 adjusted timing: 50.376766 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612656 adjusted timing: 50.435407 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611696 adjusted timing: 50.885393 seconds for 1000000 requests of 1024 bytes.
 Thread -1610610736 adjusted timing: 50.943412 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 50.953694 seconds for 1000000 requests of 1024 bytes.

i.e. jemalloc is a factor of 5 times faster for single-threaded
malloc, and about 15 times faster than phkmalloc for 5 threads.  You
see the effect of the missing TLS on sparc64 in the scaling
(i.e. performance should be even better with multiple threads), and
with some large performance variation with larger numbers of threads
(probably due to hash collisions):

# ./malloc-test 1024 1000000 20
Starting test with 20 threads...
 Thread -1610604016 adjusted timing: 48.297304 seconds for 1000000 requests of 1024 bytes.
 Thread -1610604496 adjusted timing: 104.249693 seconds for 1000000 requests of 1024 bytes.
 Thread -1610602496 adjusted timing: 109.578616 seconds for 1000000 requests of 1024 bytes.
 Thread -1610607856 adjusted timing: 252.337973 seconds for 1000000 requests of 1024 bytes.
 Thread -1610610736 adjusted timing: 254.338225 seconds for 1000000 requests of 1024 bytes.
 Thread -1610606896 adjusted timing: 255.015353 seconds for 1000000 requests of 1024 bytes.
 Thread -1610607376 adjusted timing: 257.463410 seconds for 1000000 requests of 1024 bytes.
 Thread -1610609776 adjusted timing: 257.848283 seconds for 1000000 requests of 1024 bytes.
 Thread -1610605936 adjusted timing: 257.955005 seconds for 1000000 requests of 1024 bytes.
 Thread -1610604976 adjusted timing: 259.303220 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611216 adjusted timing: 259.610871 seconds for 1000000 requests of 1024 bytes.
 Thread -1610606416 adjusted timing: 260.622687 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611696 adjusted timing: 260.857706 seconds for 1000000 requests of 1024 bytes.
 Thread -1610610256 adjusted timing: 261.056716 seconds for 1000000 requests of 1024 bytes.
 Thread -1610608816 adjusted timing: 261.764455 seconds for 1000000 requests of 1024 bytes.
 Thread -1610609296 adjusted timing: 261.800319 seconds for 1000000 requests of 1024 bytes.
 Thread -1610605456 adjusted timing: 261.748707 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 262.108598 seconds for 1000000 requests of 1024 bytes.
 Thread -1610608336 adjusted timing: 262.119440 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612656 adjusted timing: 262.315112 seconds for 1000000 requests of 1024 bytes.

I'll try to test this on a 4 CPU amd64 machine next.

Kris

Received on Mon Dec 12 2005 - 00:29:09 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:49 UTC