Peter Edwards wrote: >[Very late response: I just experienced the same problem and >remembered the issue had been brought up before] > >On 2/14/05, Greg 'groggy' Lehey <grog_at_freebsd.org> wrote: > > >>I'm having some problems with userland gdb on recent -CURRENT builds: >>at some point it hangs. >> >>Specifically, I'm setting a conditional breakpoint like this: >> >> b Minsert_blockletpointer if I->inode_num == 0x1f0bb >> >>inode_num increments for 1, so I hit this breakpoint about 100,000 >>times. Or I should. What happens is that the debugger hangs at some >>point on the way. ktrace shows multiple copies of: >> >> 12325 gdb CALL ptrace(12,0x3026,0xbfbfd5e0,0) >> 12325 gdb RET ptrace 0 >> 12325 gdb CALL ptrace(PT_STEP,0x3026,0x1,0) >> 12325 gdb RET ptrace 0 >> 12325 gdb CALL wait4(0xffffffff,0xbfbfd808,0,0) <-- stops here >> 12325 gdb RET wait4 12326/0x3026 >> 12325 gdb CALL kill(0x3026,0) >> 12325 gdb RET kill 0 >> 12325 gdb CALL ptrace(PT_GETREGS,0x3026,0xbfbfd5c0,0) >> >>When it hangs, it's at the call to wait4, as shown. It looks like the >>completion of the ptrace request isn't being reported back. >> >> > >I think I know what's going on with this, and I have a feeling that >there's a couple of other wait()-related issues that were left open on >the lists that might be explained by the issue. > >Here's my hypothesis: kern_wait() checks each child of the current >process to see if they have exited, or should otherwise report status >to wait/wait3/wait4/waitpid, If it finds that all candidate children >have nothing to report, it goes asleep, waiting to be awoken by the/a >child reporting status, and repeats the process: it looks a bit like >this: > >kern_wait() >{ >loop: > foreach child of self { > if (child has status to report) > return status; > } > lock self > msleep(on "self") > unlock self > goto loop; >} > >Problem is, that there's no lock protecting that the conditions in the >inner loop hold by the time the current process locks its own "struct >proc" and invokes msleep(). (It's probably most likely the race will >happen on an SMP machine or with PREEMPTION, but the aquiry of >curproc's lock could possibly cause the issue if it needed to sleep.), >i.e., you can miss the wakeup generated by a particular child between >checking the process in the inner loop, and going to sleep. > >I can at least reproduce this for the ptrace/gdb case, but AFAICT, it >could happen for the standard wait()/exit() path, too. I worked up a >patch to fix the problem by having those parts of the kernel that wake >the process up flag the fact in the parent's flags and doing the >wakeup while holding tha parent process lock, and noticing if this >flag has been set before sleeping. (A simpler solution would be to >hold the parent lock across the bulk of kern_wait, but from what I can >gather this will lead to at least one LOR) > >I've been unable to reproduce the problem with a kernel with this >patch, and using a nice sprinkling of printfs can show that when GDB >hangs, the race has just occurred. > >Anyone got opinions on this? >Cheers, >Peadar. > > If the parent has PS_NOCLDSTOP set, no SIGCHLD will be sent to parent, so there is race in the case, but if PS_NOCLDSTOP is set, the signal will be sent to parent, and parant should resume from msleep() immediately. I don't know why it still does have race, I am looking the code, I think stop() should be merged into thread_stopped(), there is no another caller at all. David XuReceived on Mon Apr 18 2005 - 03:26:07 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:32 UTC