On Wed, Feb 15, 2012 at 09:22:10AM -0800, Dmitry Mikulin wrote: > > > On 02/15/2012 08:32 AM, Konstantin Belousov wrote: > >On Mon, Feb 13, 2012 at 02:50:45PM -0800, Dmitry Mikulin wrote: > >>>>>It seems that now wait4(2) can be called from the real (non-debugger) > >>>>>parent first and result in the call to proc_reap(), isn't it ? We would > >>>>>then just reparent the child back to the caller, still leaving the > >>>>>zombie and confusing debugger. > >>>>When either gdb or the real parent gets to proc_reap() the process > >>>>wouldn't > >>>>get destroyed, it'll get caught by the following clause: > >>>> if (p->p_oppid&& (t = pfind(p->p_oppid)) != NULL) { > >>>> > >>>>and the real parent with get the child back into the children's list > >>>>while > >>>>gdb will get it into the orphan list. The second time around when > >>>>proc_reap() is entered, p->p_oppid will be 0 and the process will get > >>>>really reaped. Does it make sense? And proc_reparent() attempts to keep > >>>>the > >>>>orphan list clean and not have the same entries and the list of > >>>>siblings. > >>>Right, this is what I figured. But I asked about some further implication > >>>of this change: > >>> > >>>if real parent spuriosly calls wait4(2) on the child pid after the child > >>>exited, but before the debugger called the wait4(), then exactly the > >>>code you noted above will be run. This results in the child being fully > >>>returned to the original parent. > >>> > >>>Next, the wait4() call from debugger gets an error, and zombie will be > >>>kept around until parent calls wait4() for this pid once more. > >>> > >>>Am I missed something ? > >>In this case the process will move from gdb's child list to gdb's orphan > >>list when the real parent does a wait4(). Next time around the wait loop > >>in > >>gdb it'll be caught by the orphan's proc_reap(). > >I do not see how the next debugger loop could find this process at all, > >since the first wait4() call reparented it to the original parent. > > Not the debugger loop, the kern_wait() loop. The child get re-parented to > the original parent but moves to the orphan list of the debugger process. Either the debugger loop which calls wait4/waitpid, or the kern_wait loop resulting from the debugger calling wait*. Could you, please, describe, how the patched kernel moves the wait'ed zombie to the orphan list of the debugger ? For me, it seems that there is another bug, the child appears both on the childdren list, and on the orphan list of the real parent.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:24 UTC