Re: r314708: panic: tdsendsignal: ksi on queue

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Fri, 10 Mar 2017 12:44:29 +0200
On Fri, Mar 10, 2017 at 12:11:51AM +0100, Jilles Tjoelker wrote:
> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
> should always be the zombie's p_ksi; otherwise, the siginfo may be lost
> if there are too many signals pending for the target process or in the
> system. If the siginfo is lost and the reaper normally passes si_pid to
> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
> will remain until the reaper terminates.
> 
> Conceptually the siginfo is sent to one process at a time only, so the
> bug is an artifact of the implementation. Perhaps the piece of code
> added in r309886 can be moved or the ksiginfo can be removed from the
> parent's queue.
> 
> If such a fix is not possible, it may be better to send a bare SIGCHLD
> (si_code is SI_KERNEL or 0, depending on how many signals are pending)
> in this situation and document that reapers must use WAIT_ANY or P_ALL.
> (However, compared to the pre-r309886 situation they can still use
> SIGCHLD to get notified when to call waitpid() or similar.)
> 

IMO it is acceptable for reaper to know and handle the case of lost
siginfo. But also it seems to be not too hard to guarantee the queueing
of the SIGCHLD for zombie re-parerning.

diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
index c524fe5df37..1a1dcc3a4c7 100644
--- a/sys/kern/kern_exit.c
+++ b/sys/kern/kern_exit.c
_at__at_ -189,6 +189,7 _at__at_ exit1(struct thread *td, int rval, int signo)
 {
 	struct proc *p, *nq, *q, *t;
 	struct thread *tdt;
+	ksiginfo_t *ksi, *ksi1;
 
 	mtx_assert(&Giant, MA_NOTOWNED);
 	KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
_at__at_ -449,14 +450,23 _at__at_ exit1(struct thread *td, int rval, int signo)
 		wakeup(q->p_reaper);
 	for (; q != NULL; q = nq) {
 		nq = LIST_NEXT(q, p_sibling);
+		ksi = ksiginfo_alloc(TRUE);
 		PROC_LOCK(q);
 		q->p_sigparent = SIGCHLD;
 
 		if (!(q->p_flag & P_TRACED)) {
 			proc_reparent(q, q->p_reaper);
 			if (q->p_state == PRS_ZOMBIE) {
+				if (q->p_ksi == NULL) {
+					ksi1 = NULL;
+				} else {
+					ksiginfo_copy(q->p_ksi, ksi);
+					ksi->ksi_flags |= KSI_INS;
+					ksi1 = ksi;
+					ksi = NULL;
+				}
 				PROC_LOCK(q->p_reaper);
-				pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
+				pksignal(q->p_reaper, SIGCHLD, ksi1);
 				PROC_UNLOCK(q->p_reaper);
 			}
 		} else {
_at__at_ -489,6 +499,8 _at__at_ exit1(struct thread *td, int rval, int signo)
 			kern_psignal(q, SIGKILL);
 		}
 		PROC_UNLOCK(q);
+		if (ksi != NULL)
+			ksiginfo_free(ksi);
 	}
 
 	/*
Received on Fri Mar 10 2017 - 09:44:34 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC