Re: perl malloc slow?

From: Peter Jeremy <peterjeremy_at_optushome.com.au> Date: Wed, 7 Jan 2004 11:49:16 +1100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:36 UTC

On Tue, Jan 06, 2004 at 05:17:40PM +0100, Alexander Leidinger wrote:
>On Tue, 06 Jan 2004 15:36:34 +0000
>Holger Kipp <Holger.Kipp_at_alogis.com> wrote:
>
>> Kris Kennaway wrote: "For any algorithm one can come up with a workload that
>>   makes it perform badly.  The trick is making it perform well for
>>   common workloads, which FreeBSD's malloc does."
>> (http://lists.freebsd.org/pipermail/freebsd-hackers/2003-September/003059.html)
>> 
>> Even so, if Perl malloc and Linux malloc are O(n) with a small constant and
>> FreeBSD malloc behaves like O(n*n) or worse with a large constant in this
>> easy example (of a real-world applikation), this behaviour should imho be
>> investigated....
>
>You need to make a context switch for every malloc call.

This isn't true.  There are no circumstances under which phkmalloc
requires a context switch.  Since Unix uses a pre-emptive scheduler,
a context switch may occur at any time, though it is preferentially
performed during system calls.

If the free memory pool managed by phkmalloc has insufficient space to
fulfil the request, or is excessively large following a free() then it
will use brk(2) to allocate/return additional memory.  The kernel may
choose to schedule an alternative process during the brk() call.

> If the Zlib module allocates alot of small memory
>regions with the same size one after another and frees alot of them
>after a short period of time, this results in a huge amount of context
>switches in the non-perl-malloc case and in nearly no context switch in
>the with-perl-malloc case.

Phkmalloc makes it trivial to trace malloc() family calls.  Actually
tracing calls to malloc et al whilst running Compress::Zlib::memGunzip
on 4.9-STABLE shows that there are relatively few calls - about 500
calls whilst uncompressing 284K to 1484K in my test.

The reason for the poor performance is that most (~72%) of the calls
are of the form "realloc(ptr, size+=4096)", where size starts at 4097.
This probably triggers the worst-case performance for any general
purpose malloc - O(n^2).  This is due to the requirement to
continually copy the buffer contents to a new buffer.  Similar
behaviour is demonstrated by
perl -e '$x=350; $y = ""; print ".\n"; while ($x-- > 0) { $y .= "." x 4096;} print length($y), "\n";'

It's not clear why the builtin perl malloc is so much faster in this
case.  A quick check of the perl malloc code suggests that it uses a
geometric-progression bucket arrangement (whereas phkmalloc appears to
use page-sized buckets for large allocations) - this would
significantly reduce the number of realloc() copies.

>The right thing to do in such a case is to optimize the application to
>use e.g. a ring buffer. A static number of malloc() calls at startup,
>use them for a long time, free() them. Or to manage a malloc resource
>pool on it's own. This is what perl does in the with-perl-malloc case.

In this particular case, the right thing to do is to use a larger-
than-default buffer size in Compress::Zlib::memGunzip.  Patching the
code to use a buffer size matching the input string size resulted in
an order of magnitude speedup in my test case (13.9 seconds to 1.16
seconds).  I similar problem presumably exists with
Compress::Zlib::memGzip, though I haven't checked this.

I have sent a suggested patch to the Compress::Zlib author.

Peter