Re: kevent has bug?

From: Kohji Okuno <okuno.kohji_at_jp.panasonic.com>
Date: Wed, 02 Apr 2014 21:30:01 +0900 (JST)
From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Wed, 2 Apr 2014 15:07:45 +0300
> On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote:
>> From: John-Mark Gurney <jmg_at_funkthat.com>
>> Date: Tue, 1 Apr 2014 23:15:51 -0700
>> > Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900:
>> >> I think, kevent() has a bug.
>> >> I tested sample programs by attached sources.
>> >> This sample tests about EVFILT_SIGNAL.
>> >> 
>> >> I build sample programs by the following commands.
>> >> % gcc -O2 -o child child.c
>> >> % gcc -O2 -o parent parent.c
>> >> 
>> >> The expected result is the following.
>> >> % ./parent
>> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
>> >> OK
>> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
>> >> OK
>> >> 
>> >> But, sometimes the result was the following.
>> >> % ./parent
>> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
>> >> 
>> >> This result means the number of times the signal has occured was
>> >> incorrect.
>> > 
>> > I was able to reproduce this...
>> > 
>> >> In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
>> >> number of times the signal has occurred since the last call to
>> >> kevent(). This `data' is recorded by filt_signal() (This is f_event in
>> >> struct filterops).
>> >> 
>> >> The system call kevent()'s events are processed by kqueue_scan() in
>> >> kern_event.c. In kqueue_scan(), kn->kn_fop->f_event() is allways
>> >> called after KN_INFLUX is set to kn->kn_status.
>> >> 
>> >> On the other hand, kernel events are occured by knote() in
>> >> kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
>> >> in kern_sig.c.) In knote(), kn->kn_fop->f_event() is called only when
>> >> KN_INFLUX is not set in kn->kn_status.
>> >> 
>> >> In race condition between kqueue_scan() and knote(),
>> >> kn->kn_fop->f_event() from knote() may not be called, I think.
>> > 
>> > Considering that both are called w/ a lock, that cannot happen..
>> > KN_LIST_LOCK(kn) locks the same lock that is asserted that is held
>> > by knote...
>> > 
>> >> In knote(), because the context holds knlist's lock, the context can
>> >> not sleep. So, KN_INFLUX should not be set on calling
>> >> kn->kn_fop->f_event() in kqueue_scan(), I think.
>> > 
>> > No, it needs to be set:
>> >  * Setting the KN_INFLUX flag enables you to unlock the kq that this knote
>> >  * is on, and modify kn_status as if you had the KQ lock.
>> > 
>> > As this comment says, _INFLUX allows you to unlock the KQ w/o fear that
>> > the knote will disappear out from under you causing you to dereference
>> > possibly free'd memory..
>> > 
>> > If you just tried to lock the list lock w/o unlocking the KQ lock, you
>> > could end up w/ a dead lock, as you aren't maintaining lock order
>> > properly..  The correct lock order if knlist -> kq...
>> > 
>> >> What do you think about this issue?
>> > 
>> > This is a real issue, but not due to the race you described
>> > above...
>> 
>> I beleave it's the result of the race.
>> 
>> Could you try to add printf() in knote()?
>> Please refer to attached patch.
>> 
>> 
>> > I have verified on my machine that it isn't because there is a knote
>> > waiting that isn't getting woken up, and the knote on my hung process
>> > has data == 0, so it definately lost one of the signals:
>> > (kgdb) print $14.kq_knhash[20].slh_first[0]
>> > $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0},
>> >   kn_knlist = 0xfffff8005a9c5840, kn_tqe = {tqe_next = 0xfffff801fdab4500,
>> >     tqe_prev = 0xfffff8004bb10038}, kn_kq = 0xfffff8004bb10000, kn_kevent = {
>> >     ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0},
>> >   kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = {
>> >     p_fp = 0xfffff8005a9c54b8, p_proc = 0xfffff8005a9c54b8,
>> >     p_aio = 0xfffff8005a9c54b8, p_lio = 0xfffff8005a9c54b8,
>> >     p_v = 0xfffff8005a9c54b8}, kn_fop = 0xffffffff81405ef0, kn_hook = 0x0,
>> >   kn_hookid = 0}
>> > 
>> > If you want to find this yourself, you can run kgdb on a live system,
>> > switch to the thread of the parent (info threads, thread XXX), and
>> > do:
>> > frame 7
>> > 
>> > (or a frame that has td, which is struct thread *), then:
>> > print *(struct kqueue *)td->td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data
>> > 
>> > This will give you the struct kqueue * of the parent, and then:
>> > print $XX.kq_knhash[0]_at_63
>> > 
>> > to figure out where the knote is in the hash, and then you can print
>> > it out yourself...
>> > 
>> > I'm going to take a look at this a bit more later... I'm thinking of
>> > using dtrace to collect the stacks where filt_signal is called, and
>> > match them up...  dtrace might even be able to get us the note's data
>> > upon return helping to make sure things got tracked properly...
>> > 
>> > Thanks for finding this bug!  Hopefully we can find a solution to it..
>> > 
>> > -- 
>> >   John-Mark Gurney				Voice: +1 415 225 5579
>> > 
>> >      "All that I will do, has been done, All that I have, has not."
> 
>> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
>> index b3fb23d..7791447 100644
>> --- a/sys/kern/kern_event.c
>> +++ b/sys/kern/kern_event.c
>> _at__at_ -1868,6 +1868,8 _at__at_ knote(struct knlist *list, long hint, int lockflags)
>>  		if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) {
>>  			KQ_LOCK(kq);
>>  			if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) {
>> +				if (hint & NOTE_SIGNAL)
>> +					printf("Aee2\n");
>>  				KQ_UNLOCK(kq);
>>  			} else if ((lockflags & KNF_NOKQLOCK) != 0) {
>>  				kn->kn_status |= KN_INFLUX;
>> _at__at_ -1886,6 +1888,10 _at__at_ knote(struct knlist *list, long hint, int lockflags)
>>  				KQ_UNLOCK(kq);
>>  			}
>>  		}
>> +		else {
>> +			if (hint & NOTE_SIGNAL)
>> +				printf("Aee1\n");
>> +		}
>>  		kq = NULL;
>>  	}
>>  	if ((lockflags & KNF_LISTLOCKED) == 0)
> 
> I think you are right, there is a race between setting influx before the
> knote lock is owned in the kqueue_scan(), and knote() itself.  The issue
> is that the influx from scan must not prevent knote() from waking up
> the waiters.
> 
> Patch below fixed your test case for me, also tools/regression/kqueue did
> not noticed a breakage.  I tried to describe the situation in the
> comment in knote().  Also, I removed unlocked check for the KN_INFLUX
> in knote, since it seems to be an optimization for rare case, and is
> the race on its own.
> 
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index b3fb23d..380f1ff 100644
> --- a/sys/kern/kern_event.c
> +++ b/sys/kern/kern_event.c
> _at__at_ -474,7 +474,7 _at__at_ knote_fork(struct knlist *list, int pid)
>  			continue;
>  		kq = kn->kn_kq;
>  		KQ_LOCK(kq);
> -		if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) {
> +		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
>  			KQ_UNLOCK(kq);
>  			continue;
>  		}
> _at__at_ -1506,7 +1506,7 _at__at_ retry:
>  			KQ_LOCK(kq);
>  			kn = NULL;
>  		} else {
> -			kn->kn_status |= KN_INFLUX;
> +			kn->kn_status |= KN_INFLUX | KN_SCAN;
>  			KQ_UNLOCK(kq);
>  			if ((kn->kn_status & KN_KQUEUE) == KN_KQUEUE)
>  				KQ_GLOBAL_LOCK(&kq_global, haskqglobal);
> _at__at_ -1515,7 +1515,8 _at__at_ retry:
>  				KQ_LOCK(kq);
>  				KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal);
>  				kn->kn_status &=
> -				    ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX);
> +				    ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX |
> +				    KN_SCAN);
>  				kq->kq_count--;
>  				KN_LIST_UNLOCK(kn);
>  				influx = 1;
> _at__at_ -1545,7 +1546,7 _at__at_ retry:
>  			} else
>  				TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
>  			
> -			kn->kn_status &= ~(KN_INFLUX);
> +			kn->kn_status &= ~(KN_INFLUX | KN_SCAN);
>  			KN_LIST_UNLOCK(kn);
>  			influx = 1;
>  		}
> _at__at_ -1865,28 +1866,33 _at__at_ knote(struct knlist *list, long hint, int lockflags)
>  	 */
>  	SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
>  		kq = kn->kn_kq;
> -		if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) {
> +		KQ_LOCK(kq);
> +		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
> +			/*
> +			 * Do not process the influx notes, except for
> +			 * the influx coming from the kq unlock in the
> +			 * kqueue_scan().  In the later case, we do
> +			 * not interfere with the scan, since the code
> +			 * fragment in kqueue_scan() locks the knlist,
> +			 * and cannot proceed until we finished.
> +			 */
> +			KQ_UNLOCK(kq);
> +		} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> +			kn->kn_status |= KN_INFLUX;
> +			KQ_UNLOCK(kq);
> +			error = kn->kn_fop->f_event(kn, hint);
>  			KQ_LOCK(kq);
> -			if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) {
> -				KQ_UNLOCK(kq);
> -			} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> -				kn->kn_status |= KN_INFLUX;
> -				KQ_UNLOCK(kq);
> -				error = kn->kn_fop->f_event(kn, hint);
> -				KQ_LOCK(kq);
> -				kn->kn_status &= ~KN_INFLUX;
> -				if (error)
> -					KNOTE_ACTIVATE(kn, 1);
> -				KQ_UNLOCK_FLUX(kq);
> -			} else {
> -				kn->kn_status |= KN_HASKQLOCK;
> -				if (kn->kn_fop->f_event(kn, hint))
> -					KNOTE_ACTIVATE(kn, 1);
> -				kn->kn_status &= ~KN_HASKQLOCK;
> -				KQ_UNLOCK(kq);
> -			}
> +			kn->kn_status &= ~KN_INFLUX;
> +			if (error)
> +				KNOTE_ACTIVATE(kn, 1);
> +			KQ_UNLOCK_FLUX(kq);
> +		} else {
> +			kn->kn_status |= KN_HASKQLOCK;
> +			if (kn->kn_fop->f_event(kn, hint))
> +				KNOTE_ACTIVATE(kn, 1);
> +			kn->kn_status &= ~KN_HASKQLOCK;
> +			KQ_UNLOCK(kq);
>  		}
> -		kq = NULL;
>  	}
>  	if ((lockflags & KNF_LISTLOCKED) == 0)
>  		list->kl_unlock(list->kl_lockarg); 
> diff --git a/sys/sys/event.h b/sys/sys/event.h
> index bad8c9e..3b765c0 100644
> --- a/sys/sys/event.h
> +++ b/sys/sys/event.h
> _at__at_ -207,6 +207,7 _at__at_ struct knote {
>  #define KN_MARKER	0x20			/* ignore this knote */
>  #define KN_KQUEUE	0x40			/* this knote belongs to a kq */
>  #define KN_HASKQLOCK	0x80			/* for _inevent */
> +#define	KN_SCAN		0x100			/* flux set in kqueue_scan() */
>  	int			kn_sfflags;	/* saved filter flags */
>  	intptr_t		kn_sdata;	/* saved data field */
>  	union {

Hi,

I tried Konstantin's patch, too. This patch is OK, I think.

Many thanks,
 Kohji Okuno
Received on Wed Apr 02 2014 - 10:30:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:48 UTC