On 3/10/2017 2:44 AM, Konstantin Belousov wrote: > On Fri, Mar 10, 2017 at 12:11:51AM +0100, Jilles Tjoelker wrote: >> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo >> should always be the zombie's p_ksi; otherwise, the siginfo may be lost >> if there are too many signals pending for the target process or in the >> system. If the siginfo is lost and the reaper normally passes si_pid to >> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie >> will remain until the reaper terminates. >> >> Conceptually the siginfo is sent to one process at a time only, so the >> bug is an artifact of the implementation. Perhaps the piece of code >> added in r309886 can be moved or the ksiginfo can be removed from the >> parent's queue. >> >> If such a fix is not possible, it may be better to send a bare SIGCHLD >> (si_code is SI_KERNEL or 0, depending on how many signals are pending) >> in this situation and document that reapers must use WAIT_ANY or P_ALL. >> (However, compared to the pre-r309886 situation they can still use >> SIGCHLD to get notified when to call waitpid() or similar.) >> > > IMO it is acceptable for reaper to know and handle the case of lost > siginfo. But also it seems to be not too hard to guarantee the queueing > of the SIGCHLD for zombie re-parerning. > > diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c > index c524fe5df37..1a1dcc3a4c7 100644 > --- a/sys/kern/kern_exit.c > +++ b/sys/kern/kern_exit.c > _at__at_ -189,6 +189,7 _at__at_ exit1(struct thread *td, int rval, int signo) > { > struct proc *p, *nq, *q, *t; > struct thread *tdt; > + ksiginfo_t *ksi, *ksi1; > > mtx_assert(&Giant, MA_NOTOWNED); > KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo)); > _at__at_ -449,14 +450,23 _at__at_ exit1(struct thread *td, int rval, int signo) > wakeup(q->p_reaper); > for (; q != NULL; q = nq) { > nq = LIST_NEXT(q, p_sibling); > + ksi = ksiginfo_alloc(TRUE); > PROC_LOCK(q); > q->p_sigparent = SIGCHLD; > > if (!(q->p_flag & P_TRACED)) { > proc_reparent(q, q->p_reaper); > if (q->p_state == PRS_ZOMBIE) { > + if (q->p_ksi == NULL) { > + ksi1 = NULL; > + } else { > + ksiginfo_copy(q->p_ksi, ksi); > + ksi->ksi_flags |= KSI_INS; > + ksi1 = ksi; > + ksi = NULL; > + } > PROC_LOCK(q->p_reaper); > - pksignal(q->p_reaper, SIGCHLD, q->p_ksi); > + pksignal(q->p_reaper, SIGCHLD, ksi1); > PROC_UNLOCK(q->p_reaper); > } > } else { > _at__at_ -489,6 +499,8 _at__at_ exit1(struct thread *td, int rval, int signo) > kern_psignal(q, SIGKILL); > } > PROC_UNLOCK(q); > + if (ksi != NULL) > + ksiginfo_free(ksi); > } > > /* Ping? Is this still progressing to be committed? -- Regards, Bryan Drewery
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC