Re: Pthreads performance

From: Peter Edwards <peadar.edwards_at_gmail.com> Date: Fri, 11 Feb 2005 17:03:20 +0000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC

On Fri, 11 Feb 2005 17:06:31 +0200, Maxim Sobolev <sobomax_at_portaone.com> wrote:
> Hi,
> 
> Note: I am purposely posting this to developers to avoid banging into
> "FreeBSD 5/6 is slow" drum. My pthreads knowellege is pretty basic, so
> that it's possible that this all is false alarm.
> 

I haven't looked at this in detail yet: I'll tinker a bit with it
tonight, but some initial comments.

FWIW: my UP -current (feb3) box runs the KSE test faster than the
linuxthreads test: I'm getting ridiculously low number for sys/user
time in linux, so either my kernel/userland mismatch or "time" isn't
reporting the sys/user time properly (possible with the linuxthreads
using separate processes for children)

First off, when using kse/pthreads, you're probably not getting
SCOPE_SYSTEM processes by default, which would be closer to what
linuxthreads (and libthr) is doing.

Looking at the source file: (Disclaimer: just a quick look over it,
and I'm stating opinion not fact)

You've two threads: the "pushing" thread pushes  items on to a stack,
possibly waking the "popping" thread.  the the popping thread waits
for the stack to be not empty, and pops an element off.

There's a big problem that the push thread doesn't put a bound on how
much memory it will allocate, or cooperate with it's consumer.

Depending on how the scheduling works out, if the threads get longer
quanta, then the pushing thread may allocate much more memory before
the popping thread gets a chance. ie, a larger number of calls to
malloc may account for the time difference, too. eg, if the pattern of
push (>) to pop (<) looks like this:

>>>>>>>>>><<<<<<<<<<
There's 10 allocations, and 10 frees of different blocks of memory.
While this:
><><><><><><><><><><
Is more likely to cause a single block of memory to be recycled over
and over. (Of course, if it was switching for every push/pop, that'd
be a huge overhead in terms of context switching)

It would be fairer to bound the work of the push thread, rather than
have it continue on blindly, and relying on scheduling and chance to
limit the work the push thread does.

Also, If the push thread could get more "operations" done per quantum
than the pop thread, (the process would just keep growing), carving
out address space for itself.

Even fairer would be to pre-allocate the memory, to avoid measuring
the performance of the heap allocator as well (which will do its own
synchronisation operations) but you might be mildly interested in that
affect anyway: I don't think there's much work gone into FreeBSD's
malloc in terms of multithreading (but I'm open to correction), and
I've no idea about linux in this regard, but there's plenty of drop-in
replacements around).