On Monday 08 May 2006 14:52, Kris Kennaway wrote: > OK, David's patch fixes the umtx thundering herd (and seems to give a > 4-6% boost). I also fixed a thundering herd in FILEDESC_UNLOCK (which > was also waking up 2-7 CPUs at once about 30% of the time) by doing > s/wakeup/wakeup_one/. This did not seem to give a performance impact > on this test though. >.... > filedesc contention is down by a factor of 3-4, with corresponding > reduction in the average hold time. The process lock contention > coming from the signal delivery wakeup has also gone way down for some > reason. > I found that mysqld frequently calls alarm() in its file thr_alarm.c and thr_kill() to send SIGALRM to its timer thread to wake it up, the timer thread itself is being blocked in sigwait(), normally the alarm timer will be expired in a second, so the kernel will periodically call psignal to find a thread which can handle the signal, it means kernel has to periodically walk through thread list with process lock and scheduler held, this is very expensive. thr_kill will in most time wake up the timer thread earlier, in thr_kill syscall, kernel has to walk through thread list to find a thread whose thread is matching the given id, the function thread_find() uses a linear searching algorithm, it is slow, if there are lots of thread in the process, the process lock will be holden too long, I think that's the reason why you have seen so many process lock contention, if you define USE_ALARM_THREAD in mysql header file, the contention should be decreased ( I hope ), patch: --- my_pthread.h.old Mon May 8 18:16:56 2006 +++ my_pthread.h Mon May 8 18:17:07 2006 _at__at_ -267,6 +267,8 _at__at_ /* Test first for RTS or FSU threads */ +#define USE_ALARM_THREAD + #if defined(PTHREAD_SCOPE_GLOBAL) && !defined(PTHREAD_SCOPE_SYSTEM) #define HAVE_rts_threads extern int my_pthread_create_detached; > unp contention has risen a bit. The other big gain is to sleep > mtxpool contention, which roughly doubled: > > /* > * Change the total socket buffer size a user has used. > */ > int > chgsbsize(uip, hiwat, to, max) > struct uidinfo *uip; > u_int *hiwat; > u_int to; > rlim_t max; > { > rlim_t new; > > UIDINFO_LOCK(uip); > > So the next question is how can that be optimized? > may use atomic_cmpset_int in a loop to avoid context switch or use an adaptive mutex, but there is no adaptive mutex type you can specify. rlim_t is a 64bit integer, so atomic operation can not be used, but 64bit integer might not be necessary for socket buffer size. > Kris David XuReceived on Mon May 08 2006 - 08:43:37 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:55 UTC