Re: FreeBSD 11.0-ALPHA5 r302256 kernel panic in filt_proc()

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Wed, 29 Jun 2016 15:25:29 -0700 (PDT)
On 30 Jun, Konstantin Belousov wrote:
> On Wed, Jun 29, 2016 at 03:03:54PM -0700, Don Lewis wrote:
>> On 30 Jun, Konstantin Belousov wrote:
>> > On Wed, Jun 29, 2016 at 02:44:08PM -0700, Don Lewis wrote:
>> >> #10 0xffffffff80a02ddc in filt_proc (kn=0xfffff803c5679a80, 
>> >>     hint=<value optimized out>) at /usr/src/sys/kern/kern_event.c:473
>> >> #11 0xffffffff80a0173b in knote (list=<value optimized out>, hint=2147483648, 
>> >>     lockflags=<value optimized out>) at /usr/src/sys/kern/kern_event.c:2045
>> >> #12 0xffffffff80a0710e in exit1 (td=<value optimized out>, 
>> >>     rval=<value optimized out>, signo=<value optimized out>)
>> >>     at /usr/src/sys/kern/kern_exit.c:515
>> >> #13 0xffffffff80a0677d in sys_sys_exit (td=0xfffff803c5679a80, 
>> >>     uap=<value optimized out>) at /usr/src/sys/kern/kern_exit.c:178
>> >> #14 0xffffffff80eb8b2b in amd64_syscall (td=0xfffff80096b49500, traced=0)
>> >>     at subr_syscall.c:135
>> >> #15 0xffffffff80e98d9b in Xfast_syscall ()
>> >>     at /usr/src/sys/amd64/amd64/exception.S:396
>> >> #16 0x00000008009298ca in ?? ()
>> >> Previous frame inner to this frame (corrupt stack?)
>> >> Current language:  auto; currently minimal
>> >> (kgdb) 
>> >> 
>> >> 
>> >> The line numbers above seem to be off.  With kgdb from ports I see:
>> >> 
>> >> (kgdb) up
>> >> #12 filt_proc (kn=0xfffff803c5679a80, hint=<optimized out>)
>> >>     at /usr/src/sys/kern/kern_event.c:466
>> >> 466				kn->kn_data = KW_EXITCODE(p->p_xexit, p->p_xsig);
>> >> (kgdb) print kn
>> >> $1 = (struct knote *) 0xfffff803c5679a80
>> >> (kgdb) print p
>> >> $2 = (struct proc *) 0x0
>> >> 
>> > Please print out the knote, do 'p *kn'.  I am esp. interested in the
>> > kn->kn_status value.  It seems that the knote was already detached,
>> 
>> (kgdb) print *kn
>> $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, 
>>   kn_knlist = 0xfffff804a4770d40, kn_tqe = {tqe_next = 0x0, 
>>     tqe_prev = 0xfffff801b1581638}, kn_kq = 0xfffff801b1581600, kn_kevent = {
>>     ident = 70248, filter = -5, flags = 32816, fflags = 2147483648, data = 0, 
>>     udata = 0x0}, kn_status = 131, kn_sfflags = -2147483648, kn_sdata = 0, 
>>   kn_ptr = {p_fp = 0x0, p_proc = 0x0, p_aio = 0x0, p_lio = 0x0, 
>>     p_nexttime = 0x0, p_v = 0x0}, kn_fop = 0xffffffff818ed600 <proc_filtops>, 
>>   kn_hook = 0x0, kn_hookid = 0}
> 
> I probably have a plausible explanation. The knote is on knlist, it is
> registered for NOTE_EXIT (kn_sfflags == NOTE_EXIT), and most likely, it
> was registered when the corresponding process was already in exit1(), so
> that P_WEXIT flag was set. Then, the attach filter activates the knote
> immediately, it cannot know how far the exit1() progressed, it might
> have already run past the KNOTE_LOCKED() call. Failure occured because
> filt_proc clears p_proc for the note of exiting process.
> 
> The note was activated for sure: EV_EOF | EV_ONESHOT are set in kn_flags,
> KN_ACTIVE | KN_QUEUED are set in kn_status. I believe that the check
> for p_proc == NULL in filter is all what is needed to correct the issue,
> it would avoid double-activation.
> 
> Sorry for the trouble.
> 
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index 84bef45..575a330 100644
> --- a/sys/kern/kern_event.c
> +++ b/sys/kern/kern_event.c
> _at__at_ -451,6 +451,9 _at__at_ filt_proc(struct knote *kn, long hint)
>  	u_int event;
>  
>  	p = kn->kn_ptr.p_proc;
> +	if (p == NULL) /* already activated, from attach filter */
> +		return (0);
> +
>  	/* Mask off extra data. */
>  	event = (u_int)hint & NOTE_PCTRLMASK;
>  

I'll give this a try.  It seems to be a difficult bug to trigger.  The
machine was up and building ports for about 24 hours before it crashed.
Received on Wed Jun 29 2016 - 20:25:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:06 UTC