Re: [ptrace] please review follow fork/exec changes

From: Dmitry Mikulin <dmitrym_at_juniper.net> Date: Wed, 15 Feb 2012 09:54:44 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:24 UTC

On 02/15/2012 09:40 AM, Konstantin Belousov wrote:
> On Wed, Feb 15, 2012 at 09:22:10AM -0800, Dmitry Mikulin wrote:
>>
>> On 02/15/2012 08:32 AM, Konstantin Belousov wrote:
>>> On Mon, Feb 13, 2012 at 02:50:45PM -0800, Dmitry Mikulin wrote:
>>>>>>> It seems that now wait4(2) can be called from the real (non-debugger)
>>>>>>> parent first and result in the call to proc_reap(), isn't it ? We would
>>>>>>> then just reparent the child back to the caller, still leaving the
>>>>>>> zombie and confusing debugger.
>>>>>> When either gdb or the real parent gets to proc_reap() the process
>>>>>> wouldn't
>>>>>> get destroyed, it'll get caught by the following clause:
>>>>>>      if (p->p_oppid&&    (t = pfind(p->p_oppid)) != NULL) {
>>>>>>
>>>>>> and the real parent with get the child back into the children's list
>>>>>> while
>>>>>> gdb will get it into the orphan list. The second time around when
>>>>>> proc_reap() is entered, p->p_oppid will be 0 and the process will get
>>>>>> really reaped. Does it make sense? And proc_reparent() attempts to keep
>>>>>> the
>>>>>> orphan list clean and not have the same entries and the list of
>>>>>> siblings.
>>>>> Right, this is what I figured. But I asked about some further implication
>>>>> of this change:
>>>>>
>>>>> if real parent spuriosly calls wait4(2) on the child pid after the child
>>>>> exited, but before the debugger called the wait4(), then exactly the
>>>>> code you noted above will be run. This results in the child being fully
>>>>> returned to the original parent.
>>>>>
>>>>> Next, the wait4() call from debugger gets an error, and zombie will be
>>>>> kept around until parent calls wait4() for this pid once more.
>>>>>
>>>>> Am I missed something ?
>>>> In this case the process will move from gdb's child list to gdb's orphan
>>>> list when the real parent does a wait4(). Next time around the wait loop
>>>> in
>>>> gdb it'll be caught by the orphan's proc_reap().
>>> I do not see how the next debugger loop could find this process at all,
>>> since the first wait4() call reparented it to the original parent.
>> Not the debugger loop, the kern_wait() loop. The child get re-parented to
>> the original parent but moves to the orphan list of the debugger process.
> Either the debugger loop which calls wait4/waitpid, or the kern_wait loop
> resulting from the debugger calling wait*.
>
> Could you, please, describe, how the patched kernel moves the wait'ed
> zombie to the orphan list of the debugger ?
> For me, it seems that there is another bug, the child appears both on
> the childdren list, and on the orphan list of the real parent.

The first attempt to reap the child will get into the
     if (p->p_oppid && (t = pfind(p->p_oppid)) != NULL) {
clause, which will re-parent it to the real parent. The child will not be destroyed at this point.

The following loop in proc_reparent() will make sure that the child does not stay in both lists:
     LIST_FOREACH(p, &parent->p_orphans, p_orphan) {
         if (p == child) {
             LIST_REMOVE(child, p_orphan);
             break;
         }
     }

Since the child parent is gdb and it's still being traced, the following will move it to gdb's orphan list:

     if (child->p_flag & P_TRACED)
         LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan);

After this the real parent will get the exit status.

The next pass through the kern_wait() loop called from gdb will catch the child in its orphan list and will reap it this time for real since p->p_oppid will be set to 0 in the previous attempt to reap it. Gdb gets the exit code, the child is destroyed.