Re: FreeBSD deadlock (with fork?)

From: John Baldwin <jhb_at_freebsd.org>
Date: Thu, 18 Sep 2008 17:35:29 -0400
On Thursday 18 September 2008 12:31:42 am David Naylor wrote:
> Hi,
> 
> I have a program that spawns a lot of subprocesses (with pipes open) from 
> multiple threads.  The problem is the program often deadlocks, but not 
> consistently.  Sometimes the program can run over 5 times to competition 
> without incidence and yet othertimes it locks within a few seconds.  
> 
> However if I limit the thread count to 1 the problem does not appear to be 
> present.  
> 
> Here are the logs from gdb:
> (gdb) info thread
>   5 Thread 7021c0 (LWP 100203)  0x00000008009a2e8c in _umtx_op_err ()
>     at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
>   4 Thread a28480 (LWP 100174)  0x00000008009a2e8c in _umtx_op_err ()
>     at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
>   3 Thread a61d80 (LWP 100175)  0x00000008009a2e8c in _umtx_op_err ()
>     at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
>   2 Thread a61bc0 (LWP 100176)  0x00000008009a2e8c in _umtx_op_err ()
>     at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
> * 1 Thread a61840 (LWP 100177)  0x00000008009a2e8c in _umtx_op_err ()
>     at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
> 
> 
> (gdb) bt
> #0  0x00000008009a2e8c in _umtx_op_err () 
> at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
> #1  0x00000008009a1331 in cond_wait_common (cond=Variable "cond" is not 
> available.

This is not waiting on a lock, this is a pthread_condvar_wait() of some sort.

> (gdb) thr 2
> [Switching to thread 2 (Thread a61bc0 (LWP 100176))]#0  0x00000008009a2e8c 
in 
> _umtx_op_err ()
>     at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
> 37      RSYSCALL_ERR(_umtx_op)
> (gdb) bt
> #0  0x00000008009a2e8c in _umtx_op_err () 
> at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
> #1  0x00000008009a1331 in cond_wait_common (cond=Variable "cond" is not 
> available.

Simiarly here.  I don't think you have a deadlock.  I think you have a bug 
where you are missing a pthread_condvar_signal() or broadcast or some such. 
Or maybe you aren't holding the mutex when doing the signal or broadcast.

-- 
John Baldwin
Received on Thu Sep 18 2008 - 19:38:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:35 UTC