[Very late response: I just experienced the same problem and remembered the issue had been brought up before] On 2/14/05, Greg 'groggy' Lehey <grog_at_freebsd.org> wrote: > I'm having some problems with userland gdb on recent -CURRENT builds: > at some point it hangs. > > Specifically, I'm setting a conditional breakpoint like this: > > b Minsert_blockletpointer if I->inode_num == 0x1f0bb > > inode_num increments for 1, so I hit this breakpoint about 100,000 > times. Or I should. What happens is that the debugger hangs at some > point on the way. ktrace shows multiple copies of: > > 12325 gdb CALL ptrace(12,0x3026,0xbfbfd5e0,0) > 12325 gdb RET ptrace 0 > 12325 gdb CALL ptrace(PT_STEP,0x3026,0x1,0) > 12325 gdb RET ptrace 0 > 12325 gdb CALL wait4(0xffffffff,0xbfbfd808,0,0) <-- stops here > 12325 gdb RET wait4 12326/0x3026 > 12325 gdb CALL kill(0x3026,0) > 12325 gdb RET kill 0 > 12325 gdb CALL ptrace(PT_GETREGS,0x3026,0xbfbfd5c0,0) > > When it hangs, it's at the call to wait4, as shown. It looks like the > completion of the ptrace request isn't being reported back. I think I know what's going on with this, and I have a feeling that there's a couple of other wait()-related issues that were left open on the lists that might be explained by the issue. Here's my hypothesis: kern_wait() checks each child of the current process to see if they have exited, or should otherwise report status to wait/wait3/wait4/waitpid, If it finds that all candidate children have nothing to report, it goes asleep, waiting to be awoken by the/a child reporting status, and repeats the process: it looks a bit like this: kern_wait() { loop: foreach child of self { if (child has status to report) return status; } lock self msleep(on "self") unlock self goto loop; } Problem is, that there's no lock protecting that the conditions in the inner loop hold by the time the current process locks its own "struct proc" and invokes msleep(). (It's probably most likely the race will happen on an SMP machine or with PREEMPTION, but the aquiry of curproc's lock could possibly cause the issue if it needed to sleep.), i.e., you can miss the wakeup generated by a particular child between checking the process in the inner loop, and going to sleep. I can at least reproduce this for the ptrace/gdb case, but AFAICT, it could happen for the standard wait()/exit() path, too. I worked up a patch to fix the problem by having those parts of the kernel that wake the process up flag the fact in the parent's flags and doing the wakeup while holding tha parent process lock, and noticing if this flag has been set before sleeping. (A simpler solution would be to hold the parent lock across the bulk of kern_wait, but from what I can gather this will lead to at least one LOR) I've been unable to reproduce the problem with a kernel with this patch, and using a nice sprinkling of printfs can show that when GDB hangs, the race has just occurred. Anyone got opinions on this? Cheers, Peadar.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:32 UTC