Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD

From: Randall Stewart <rrs_at_cisco.com>
Date: Wed, 28 Feb 2007 19:00:15 -0500
Robert Watson wrote:
> 
> On Wed, 28 Feb 2007, Stephane E. Potvin wrote:
> 
>>> Please let me know if you experience any problems with UNIX domain 
>>> sockets -- these changes will affect applications that consume UNIX 
>>> domain sockets directly, like MySQL and Postfix, as well as consumers 
>>> of POSIX fifos, which are implemented using UNIX domain sockets 
>>> in-kernel.
>>
>> Since this commit, I've been observing frequent deadlocks on my 
>> laptop, mostly when starting-up gnome. It usually takes less than 5 to 
>> 10 minutes for the deadlock to happens.
>>
>> I was able to drop into ddb once and got the following information: 
>> (there might be some typos as I had to copy this manually)
> 
> Thanks, this information was very helpful, and indeed the problem is as 
> you surmise: cases existed where more than one unpcb lock was acquired 
> at a time when holding only a global read lock, not a global write 
> lock.  I guess these slipped through from an earlier version of the 
> patch.  In any case, could you try the patch at:
> 
>   http://www.watson.org/~robert/freebsd/netperf/20070228-unp_deadlock.diff
> 
> This eliminates overlapped unpcb lock acquisition in both datagram and 
> stream cases, and with any luck will fix the deadlock problem.  It may 
> also marginally improve performance by further reducing unpcb lock 
> contention.
> 
> Thanks,
> 
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> 
>>
>> show alllocks
>> Process 906 (gnome-power-manager) thread 0xc553c570 (100126)
>> exclusive sleep mutex unp_mtx r = 0 (0xc5573bb8) locked _at_ 
>> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849
>> shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked _at_ 
>> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768
>> Process 860 (dbus-daemon) thread 0xc4d001d0 (100095)
>> exclusive sleep mutex unp_mtx r = 0 (0xc5573b10) locked _at_ 
>> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849
>> shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked _at_ 
>> /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768
>>
>> show lock 0xc5573bb8
>> class: sleep mutex
>> name: unp_mtx
>> flags: {DEF, RECURSE, DUPOK}
>> state: {OWNED, CONTESTED}
>> owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager")
>>
>> show turnstile 0xc5573bb8
>> Lock: 0xc5573bb8 - (sleep mutex) unp_mtx
>> Lock Owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager")
>> Shared Waiters:
>>     empty
>> Exclusive Waiters:
>>     0xc4d001d0 (tid 100095, pid 860, "dbus-daemon")
>> Pending Threads:
>>     empty
>>
>> show lock 0xc5573b10
>> class: sleep mutex
>> name: unp_mtx
>> flags: {DEF, RECURSE, DUPOK}
>> state: {OWNED, CONTESTED}
>> owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon")
>>
>> show turnstile 0xc5573b10
>> Lock: 0xc5573b10 - (sleep mutex) unp_mtx
>> Lock Owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon")
>> Shared Waiters:
>>     empty
>> Exclusive Waiters:
>>     0xc553c570 (tid 100126, pid 906, "gnome-power-manager")
>> Pending Threads:
>>     empty
>>
>> show lock 0xc06d1dac
>> class: rw
>> name: unp_global_rwlock
>> state: RLOCK: 2 locks
>> waiters: writers
>>
>> show turnstile 0xc06d1dac
>> Lock: 0xc06d1dac - (rw) unp_global_rwlock
>> Lock Owner: none
>> Shared Waiters:
>>     empty
>> Exclusive Waiters:
>>     0xc4d00000 (tid 100096, pid 857, "gconfd-2")
>>     0xc4d01570 (tid 100085, pid 804, "login")
>>     0xc4fcaae0 (tid 100133, pid 887, "bonobo-activation-s")
>>     0xc48c23a0 (tid 100106, pid 897, "gaim")
>>     0xc4d01910 (tid 100120, pid 909, "gnome-screensaver")
>>     0xc553cae0 (tid 100123, pid 905, "gnome-mount")
>> Pending Threads:
>>     empty
>>
>> bt 100095
>> Tracing pid 860 tid 100095 td 0xc4d001d0
>> shced_switch(3301966288,0,1,3226391662,3310601584,...) at 3226314602 = 
>> sched_switch+303
>> mi_switch(1,0,3227647346,647,3228084884,...) at 3226245932 = 
>> mi_switch+489
>> turnstile_wait(3310828472,3310601584,0,3310601586,3310828472,...) at 
>> 3226393861 = turnstile_wait+633
>> _mtx_lock_sleep(3310828472,3301966288,0,3227660663,877,...) at 
>> 3226177946 = _mtx_lock_sleep+261
>> _mtx_lock_flags(3310828472,0,3227660663,877,3310833112,...) at 
>> 3226177102 = _mtx_lock_flags+102
>> uipc_send(3310832888,0,3296484864,0,0,...) at 3226561343 = uipc_send+1058
>> sosend_generic(3310832888,0,3302262848,3296484864,0,...) at 3226529764 
>> = sosend_generic_1067
>> sosend(3310832888,0,3302262848,0,0,...) at 3226530139 = sosend+63
>> soo_write(3304721288,3302262848,3297254528,0,3301966288,...) at 
>> 3226433647 = soo_write+121
>> dofilewrite
>> kern_writev
>> writev
>> syscall
>>
>> bt 100126
>> Tracing pid 906 tid 100126 td 0xc553c570
>> sched_switch
>> mi_switch
>> turnstile_wait
>> _mtx_lock_sleep
>> _mtx_locl_flags
>> uipc_send
>> sosend_generic
>> sosend
>> soo_write
>> dofilewrite
>> kern_writev
>> writev
>> syscalL
>>
>> As you can see, the threads 100095 and 100126 both are waiting on each 
>> other's lock. The function uipc_send tries to lock two unp_mtx without 
>> holding a write lock on unp_global_rwlock. It seems that the write 
>> ownership is taken by uipc_send only if nam is not NULL or the 
>> PRUS_EOF flag is set. Both of these conditions are false in this 
>> particular call scenario. From the comments just above the second lock 
>> in uipc_usrreq.c, the global write lock should already acquired by the 
>> time we get there. I'm not sure where or under what condition the 
>> write lock should be acquired to correctly fix this. I'll keep the 
>> core around in case you want me to provide more information.
>>
>> Regards,
>>
>> Steph
>>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> 
Robert:

I have been having the same problem.. and thought it was some
of my code ;-o.... but I see now its not (after more testing)

I will try your patch and get back to you :-D

R

-- 
Randall Stewart
NSSTG - Cisco Systems Inc.
803-345-0369 <or> 803-317-4952 (cell)
Received on Wed Feb 28 2007 - 23:04:20 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:06 UTC