On Fri, 11 Feb 2005 17:41:26 -0500, David Schultz <das_at_freebsd.org> wrote: > On Fri, Feb 11, 2005, Maxim Sobolev wrote: > > Thank you for the analysis! Looks like you have at least some valid > > points. I've modified the code to count how many times producer calls > > malloc() to allocate a new slot, and got the following numbers: > > > > -bash-2.05b$ ./aqueue_linuxthreads -n 10000000 > > pusher started > > poper started > > total 237482 slots used > > -bash-2.05b$ ./aqueue_kse -n 10000000 > > pusher started > > poper started > > total 403966 slots used > > -bash-2.05b$ ./aqueue_thr -n 10000000 > > pusher started > > poper started > > total 223634 slots used > > -bash-2.05b$ ./aqueue_c_r -n 10000000 > > pusher started > > poper started > > total 55589 slots used > > > > This suggests that indeed, it is unfair to compare KSE times to LT > > times, since KSE have done almost 2x more malloc()s than LT. However, as > > you can see, libthr have done comparable number of allocations, while > > c_r about 4 times less, so that only malloc() cost can't fully explain > > the difference in results. > > The difference in the number of mallocs may be related to the way > mutex unlocks work. Some systems do direct handoff to the next > waiting thread. Suppose one thread does: > > pthread_mutex_lock() > pthread_mutex_unlock() > pthread_mutex_lock() > > With direct handoff, the second lock operation would automatically > cause an immediate context switch, since ownership of the mutex > has already been transferred to the other thread. Without direct > handoff, the thread may be able to get the lock back immediately; > in fact, this is almost certainly what will happen on a uniprocessor. > Since the example code has no mechanism to ensure fairness, without > direct handoff, one of the threads could perform thousands of > iterations before the other one wakes up, and this could explain > all the calls to malloc(). > > The part of this picture that doesn't fit is that I was under the > impression that KSE uses direct handoff... The direct handoff is probably fine for a directly contended mutex, but for condition variables, IMHO, it makes more sense to _not_ do direct handoff. In a standard producer/consumer model, it seems better to have the producer work to the point that it gets flow controlled, and then let the consumer start processing the available data: i.e., rather than deal with 100 context switches of (produce->consume)x50 , it's likely that (produce)x50->(consume)x50 will reduce context switching, and improve cacheing behaviour. i.e., I'd rather not loose my quantum just because I created some productive work for a consumer to process: It looses many locality of reference benefits. I think that's a much more realistic scenario for the use of condition variables than the sample under discussion. Disclaimer: This is based on instinct and limited experience rather than rigorous research. :-)Received on Sat Feb 12 2005 - 01:00:04 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC