Re: r314708: panic: tdsendsignal: ksi on queue

From: Bryan Drewery <bdrewery_at_FreeBSD.org>
Date: Fri, 17 Mar 2017 11:26:43 -0700
On 3/10/2017 2:44 AM, Konstantin Belousov wrote:
> On Fri, Mar 10, 2017 at 12:11:51AM +0100, Jilles Tjoelker wrote:
>> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
>> should always be the zombie's p_ksi; otherwise, the siginfo may be lost
>> if there are too many signals pending for the target process or in the
>> system. If the siginfo is lost and the reaper normally passes si_pid to
>> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
>> will remain until the reaper terminates.
>>
>> Conceptually the siginfo is sent to one process at a time only, so the
>> bug is an artifact of the implementation. Perhaps the piece of code
>> added in r309886 can be moved or the ksiginfo can be removed from the
>> parent's queue.
>>
>> If such a fix is not possible, it may be better to send a bare SIGCHLD
>> (si_code is SI_KERNEL or 0, depending on how many signals are pending)
>> in this situation and document that reapers must use WAIT_ANY or P_ALL.
>> (However, compared to the pre-r309886 situation they can still use
>> SIGCHLD to get notified when to call waitpid() or similar.)
>>
> 
> IMO it is acceptable for reaper to know and handle the case of lost
> siginfo. But also it seems to be not too hard to guarantee the queueing
> of the SIGCHLD for zombie re-parerning.
> 
> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
> index c524fe5df37..1a1dcc3a4c7 100644
> --- a/sys/kern/kern_exit.c
> +++ b/sys/kern/kern_exit.c
> _at__at_ -189,6 +189,7 _at__at_ exit1(struct thread *td, int rval, int signo)
>  {
>  	struct proc *p, *nq, *q, *t;
>  	struct thread *tdt;
> +	ksiginfo_t *ksi, *ksi1;
>  
>  	mtx_assert(&Giant, MA_NOTOWNED);
>  	KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
> _at__at_ -449,14 +450,23 _at__at_ exit1(struct thread *td, int rval, int signo)
>  		wakeup(q->p_reaper);
>  	for (; q != NULL; q = nq) {
>  		nq = LIST_NEXT(q, p_sibling);
> +		ksi = ksiginfo_alloc(TRUE);
>  		PROC_LOCK(q);
>  		q->p_sigparent = SIGCHLD;
>  
>  		if (!(q->p_flag & P_TRACED)) {
>  			proc_reparent(q, q->p_reaper);
>  			if (q->p_state == PRS_ZOMBIE) {
> +				if (q->p_ksi == NULL) {
> +					ksi1 = NULL;
> +				} else {
> +					ksiginfo_copy(q->p_ksi, ksi);
> +					ksi->ksi_flags |= KSI_INS;
> +					ksi1 = ksi;
> +					ksi = NULL;
> +				}
>  				PROC_LOCK(q->p_reaper);
> -				pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
> +				pksignal(q->p_reaper, SIGCHLD, ksi1);
>  				PROC_UNLOCK(q->p_reaper);
>  			}
>  		} else {
> _at__at_ -489,6 +499,8 _at__at_ exit1(struct thread *td, int rval, int signo)
>  			kern_psignal(q, SIGKILL);
>  		}
>  		PROC_UNLOCK(q);
> +		if (ksi != NULL)
> +			ksiginfo_free(ksi);
>  	}
>  
>  	/*


Ping?  Is this still progressing to be committed?


-- 
Regards,
Bryan Drewery


Received on Fri Mar 17 2017 - 17:27:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC