Re: New libc malloc patch

From: Kris Kennaway <kris_at_obsecurity.org>
Date: Sun, 11 Dec 2005 23:30:23 -0500
On Sun, Dec 11, 2005 at 08:29:07PM -0500, Kris Kennaway wrote:

> I'll try to test this on a 4 CPU amd64 machine next.

phkmalloc:

# ./malloc-test 1024 10000000 1
Starting test with 1 thread...
 Thread 5298176 adjusted timing: 4.173052 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 2
Starting test with 2 threads...
 Thread 5299200 adjusted timing: 325.108643 seconds for 10000000 requests of 1024 bytes.
 Thread 5298176 adjusted timing: 325.202485 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 3
Starting test with 3 threads...
 Thread 5414912 adjusted timing: 1133.238459 seconds for 10000000 requests of 1024 bytes.
 Thread 5299200 adjusted timing: 1134.525255 seconds for 10000000 requests of 1024 bytes.
 Thread 5298176 adjusted timing: 1134.539555 seconds for 10000000 requests of 1024 bytes.

jemalloc:

# ./malloc-test 1024 10000000 1
Starting test with 1 thread...
 Thread 1073760528 adjusted timing: 3.777175 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 2
Starting test with 2 threads...
 Thread 1073760560 adjusted timing: 3.851702 seconds for 10000000 requests of 1024 bytes.
 Thread 1073761584 adjusted timing: 3.887943 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 3
Starting test with 3 threads...
 Thread 1073760528 adjusted timing: 3.866206 seconds for 10000000 requests of 1024 bytes.
 Thread 1073761552 adjusted timing: 13.382795 seconds for 10000000 requests of 1024 bytes.
 Thread 1073762688 adjusted timing: 14.407229 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 4
Starting test with 4 threads...
 Thread 1073760528 adjusted timing: 3.782923 seconds for 10000000 requests of 1024 bytes.
 Thread 1073763792 adjusted timing: 6.668655 seconds for 10000000 requests of 1024 bytes.
 Thread 1073762688 adjusted timing: 14.346569 seconds for 10000000 requests of 1024 bytes.
 Thread 1073761584 adjusted timing: 14.680211 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 5
Starting test with 5 threads...
 Thread 1073760560 adjusted timing: 4.748248 seconds for 10000000 requests of 1024 bytes.
 Thread 1073761584 adjusted timing: 9.898153 seconds for 10000000 requests of 1024 bytes.
 Thread 1073764896 adjusted timing: 13.019884 seconds for 10000000 requests of 1024 bytes.
 Thread 1073762688 adjusted timing: 15.326908 seconds for 10000000 requests of 1024 bytes.
 Thread 1073763792 adjusted timing: 15.442164 seconds for 10000000 requests of 1024 bytes.

So it's 1.1 times faster for single-threaded, and 107 times faster
with 3 threads.

With libthr instead of libpthread:

phkmalloc:

# ./malloc-test 1024 10000000 1
Starting test with 1 thread...
 Thread 5255680 adjusted timing: 2.357247 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 2
Starting test with 2 threads...
 Thread 5256192 adjusted timing: 10.964918 seconds for 10000000 requests of 1024 bytes.
 Thread 5255680 adjusted timing: 11.001288 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 3
Starting test with 3 threads...
 Thread 5255680 adjusted timing: 17.467754 seconds for 10000000 requests of 1024 bytes.
 Thread 5256704 adjusted timing: 17.724583 seconds for 10000000 requests of 1024 bytes.
 Thread 5256192 adjusted timing: 17.913381 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 4
Starting test with 4 threads...
 Thread 5255680 adjusted timing: 42.715420 seconds for 10000000 requests of 1024 bytes.
 Thread 5256192 adjusted timing: 43.481252 seconds for 10000000 requests of 1024 bytes.
 Thread 5256704 adjusted timing: 43.871452 seconds for 10000000 requests of 1024 bytes.
 Thread 5257216 adjusted timing: 43.887820 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 5
Starting test with 5 threads...
 Thread 5255680 adjusted timing: 139.316332 seconds for 10000000 requests of 1024 bytes.
 Thread 5257216 adjusted timing: 140.117720 seconds for 10000000 requests of 1024 bytes.
 Thread 5256192 adjusted timing: 140.134057 seconds for 10000000 requests of 1024 bytes.
 Thread 5256704 adjusted timing: 140.855289 seconds for 10000000 requests of 1024 bytes.
 Thread 5257728 adjusted timing: 140.865934 seconds for 10000000 requests of 1024 bytes.

jemalloc:

# ./malloc-test 1024 10000000 1
Starting test with 1 thread...
 Thread 1073742416 adjusted timing: 1.366353 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 2
Starting test with 2 threads...
 Thread 1073742416 adjusted timing: 1.429485 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742896 adjusted timing: 1.530733 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 3
Starting test with 3 threads...
 Thread 1073742416 adjusted timing: 1.419813 seconds for 10000000 requests of 1024 bytes.
 Thread 1073743376 adjusted timing: 1.432790 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742896 adjusted timing: 1.490218 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 4
Starting test with 4 threads...
 Thread 1073743376 adjusted timing: 1.447554 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742416 adjusted timing: 1.503659 seconds for 10000000 requests of 1024 bytes.
 Thread 1073743856 adjusted timing: 1.503937 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742896 adjusted timing: 1.504926 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 5
Starting test with 5 threads...
 Thread 1073743376 adjusted timing: 1.595239 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742896 adjusted timing: 1.689753 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742416 adjusted timing: 1.750115 seconds for 10000000 requests of 1024 bytes.
 Thread 1073744336 adjusted timing: 1.744271 seconds for 10000000 requests of 1024 bytes.
 Thread 1073743856 adjusted timing: 1.890269 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 6
Starting test with 6 threads...
 Thread 1073743856 adjusted timing: 1.847653 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742416 adjusted timing: 2.018481 seconds for 10000000 requests of 1024 bytes.
 Thread 1073743376 adjusted timing: 2.059817 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742896 adjusted timing: 2.129204 seconds for 10000000 requests of 1024 bytes.
 Thread 1073744336 adjusted timing: 2.223751 seconds for 10000000 requests of 1024 bytes.
 Thread 1073744816 adjusted timing: 2.293809 seconds for 10000000 requests of 1024 bytes.
# ./malloc-test 1024 10000000 20
Starting test with 20 threads...
 Thread 1073744816 adjusted timing: 5.113769 seconds for 10000000 requests of 1024 bytes.
 Thread 1073751136 adjusted timing: 4.973369 seconds for 10000000 requests of 1024 bytes.
 Thread 1073750176 adjusted timing: 5.295912 seconds for 10000000 requests of 1024 bytes.
 Thread 1073745296 adjusted timing: 5.502331 seconds for 10000000 requests of 1024 bytes.
 Thread 1073743856 adjusted timing: 5.614890 seconds for 10000000 requests of 1024 bytes.
 Thread 1073744336 adjusted timing: 5.608690 seconds for 10000000 requests of 1024 bytes.
 Thread 1073752096 adjusted timing: 5.555465 seconds for 10000000 requests of 1024 bytes.
 Thread 1073748736 adjusted timing: 5.650922 seconds for 10000000 requests of 1024 bytes.
 Thread 1073748256 adjusted timing: 6.608054 seconds for 10000000 requests of 1024 bytes.
 Thread 1073750656 adjusted timing: 7.144998 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742896 adjusted timing: 7.390905 seconds for 10000000 requests of 1024 bytes.
 Thread 1073746256 adjusted timing: 7.364728 seconds for 10000000 requests of 1024 bytes.
 Thread 1073742416 adjusted timing: 7.556064 seconds for 10000000 requests of 1024 bytes.
 Thread 1073749216 adjusted timing: 7.357179 seconds for 10000000 requests of 1024 bytes.
 Thread 1073752576 adjusted timing: 7.349483 seconds for 10000000 requests of 1024 bytes.
c Thread 1073747776 adjusted timing: 7.375179 seconds for 10000000 requests of 1024 bytes.
 Thread 1073751616 adjusted timing: 7.557854 seconds for 10000000 requests of 1024 bytes.
 Thread 1073743376 adjusted timing: 7.915978 seconds for 10000000 requests of 1024 bytes.
 Thread 1073749696 adjusted timing: 7.795219 seconds for 10000000 requests of 1024 bytes.
 Thread 1073745776 adjusted timing: 8.007392 seconds for 10000000 requests of 1024 bytes.

So libthr is *much* faster than libpthread with both malloc
implementations, but jemalloc is still 1.7 times faster for 1 thread
and 80 times faster for 5 threads than phkmalloc.

Kris

P.S. Holy crap :)

Received on Mon Dec 12 2005 - 03:30:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:49 UTC