Re: Pthreads performance

From: Maxim Sobolev <sobomax_at_portaone.com>
Date: Fri, 11 Feb 2005 19:32:50 +0200
Thank you for the analysis! Looks like you have at least some valid 
points. I've modified the code to count how many times producer calls 
malloc() to allocate a new slot, and got the following numbers:

-bash-2.05b$ ./aqueue_linuxthreads -n 10000000
pusher started
poper started
total 237482 slots used
-bash-2.05b$ ./aqueue_kse -n 10000000
pusher started
poper started
total 403966 slots used
-bash-2.05b$ ./aqueue_thr -n 10000000
pusher started
poper started
total 223634 slots used
-bash-2.05b$ ./aqueue_c_r -n 10000000
pusher started
poper started
total 55589 slots used

This suggests that indeed, it is unfair to compare KSE times to LT 
times, since KSE have done almost 2x more malloc()s than LT. However, as 
you can see, libthr have done comparable number of allocations, while 
c_r about 4 times less, so that only malloc() cost can't fully explain 
the difference in results.

-Maxim

Peter Edwards wrote:
> On Fri, 11 Feb 2005 17:06:31 +0200, Maxim Sobolev <sobomax_at_portaone.com> wrote:
> 
>>Hi,
>>
>>Note: I am purposely posting this to developers to avoid banging into
>>"FreeBSD 5/6 is slow" drum. My pthreads knowellege is pretty basic, so
>>that it's possible that this all is false alarm.
>>
> 
> 
> I haven't looked at this in detail yet: I'll tinker a bit with it
> tonight, but some initial comments.
> 
> FWIW: my UP -current (feb3) box runs the KSE test faster than the
> linuxthreads test: I'm getting ridiculously low number for sys/user
> time in linux, so either my kernel/userland mismatch or "time" isn't
> reporting the sys/user time properly (possible with the linuxthreads
> using separate processes for children)
> 
> First off, when using kse/pthreads, you're probably not getting
> SCOPE_SYSTEM processes by default, which would be closer to what
> linuxthreads (and libthr) is doing.
> 
> Looking at the source file: (Disclaimer: just a quick look over it,
> and I'm stating opinion not fact)
> 
> You've two threads: the "pushing" thread pushes  items on to a stack,
> possibly waking the "popping" thread.  the the popping thread waits
> for the stack to be not empty, and pops an element off.
> 
> There's a big problem that the push thread doesn't put a bound on how
> much memory it will allocate, or cooperate with it's consumer.
> 
> Depending on how the scheduling works out, if the threads get longer
> quanta, then the pushing thread may allocate much more memory before
> the popping thread gets a chance. ie, a larger number of calls to
> malloc may account for the time difference, too. eg, if the pattern of
> push (>) to pop (<) looks like this:
> 
> 
>>>>>>>>>>><<<<<<<<<<
> 
> There's 10 allocations, and 10 frees of different blocks of memory.
> While this:
> 
>><><><><><><><><><><
> 
> Is more likely to cause a single block of memory to be recycled over
> and over. (Of course, if it was switching for every push/pop, that'd
> be a huge overhead in terms of context switching)
> 
> It would be fairer to bound the work of the push thread, rather than
> have it continue on blindly, and relying on scheduling and chance to
> limit the work the push thread does.
> 
> Also, If the push thread could get more "operations" done per quantum
> than the pop thread, (the process would just keep growing), carving
> out address space for itself.
> 
> Even fairer would be to pre-allocate the memory, to avoid measuring
> the performance of the heap allocator as well (which will do its own
> synchronisation operations) but you might be mildly interested in that
> affect anyway: I don't think there's much work gone into FreeBSD's
> malloc in terms of multithreading (but I'm open to correction), and
> I've no idea about linux in this regard, but there's plenty of drop-in
> replacements around).
> 
> 
> 
Received on Fri Feb 11 2005 - 16:33:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC