Deadlock involving truss -f, pdfork() and wait4()

From: Ryan Stone <rysto32_at_gmail.com>
Date: Thu, 12 Sep 2019 15:37:25 -0400
I've hit an issue with a simple use of pdfork().  I have a process
that calls pdfork() and the parent immediately does a wait4() on the
child pid.  This works fine under normal conditions, but if the parent
is run under truss -f, the three processes deadlock.  If I switch out
pdfork() for fork(), the deadlock does not occur.

This C file demonstrates the issue:

https://people.freebsd.org/~rstone/pdfork.c

If I run "truss -f ./pdfork", which uses fork(), it completes within a
second.  If I run "truss -f ./pdfork -p", which uses pdfork(), the
processes deadlock.  If I run "./pdfork -p" without truss, it
completes normally.

procstat reports the following kernel stacks:

27572 102043 truss               -                   mi_switch+0xe2
sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf
kern_wait6+0x695 sys_wait6+0x9f amd64_syscall+0x36e
fast_syscall_common+0x101
27573 102469 pdfork              -                   mi_switch+0xe2
sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _sleep+0x1bf
kern_wait6+0x695 sys_wait4+0x78 amd64_syscall+0x36e
fast_syscall_common+0x101
27574 102053 pdfork              -                   mi_switch+0xe2
thread_suspend_switch+0xd4 ptracestop+0x13b fork_return+0x14e
fork_exit+0x83 fork_trampoline+0xe

As near as I can tell, truss is blocked waiting for ptrace events, the
parent process is blocked in wait4, and the child process is perhaps
waiting for its parent to exit the kernel so it can send the ptrace
event?

I really don't see anything obvious in the pdfork() code path that
would cause this to happen when fork() doesn't have the problem.  It
may be that pdfork() just changes the timing enough to expose a latent
bug.

I'm seeing this on a recentish current (r351363).
Received on Thu Sep 12 2019 - 17:37:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC