From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Wed, 2 Apr 2014 15:07:45 +0300 > On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote: >> From: John-Mark Gurney <jmg_at_funkthat.com> >> Date: Tue, 1 Apr 2014 23:15:51 -0700 >> > Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900: >> >> I think, kevent() has a bug. >> >> I tested sample programs by attached sources. >> >> This sample tests about EVFILT_SIGNAL. >> >> >> >> I build sample programs by the following commands. >> >> % gcc -O2 -o child child.c >> >> % gcc -O2 -o parent parent.c >> >> >> >> The expected result is the following. >> >> % ./parent >> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 >> >> OK >> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 >> >> OK >> >> >> >> But, sometimes the result was the following. >> >> % ./parent >> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 >> >> >> >> This result means the number of times the signal has occured was >> >> incorrect. >> > >> > I was able to reproduce this... >> > >> >> In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the >> >> number of times the signal has occurred since the last call to >> >> kevent(). This `data' is recorded by filt_signal() (This is f_event in >> >> struct filterops). >> >> >> >> The system call kevent()'s events are processed by kqueue_scan() in >> >> kern_event.c. In kqueue_scan(), kn->kn_fop->f_event() is allways >> >> called after KN_INFLUX is set to kn->kn_status. >> >> >> >> On the other hand, kernel events are occured by knote() in >> >> kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal() >> >> in kern_sig.c.) In knote(), kn->kn_fop->f_event() is called only when >> >> KN_INFLUX is not set in kn->kn_status. >> >> >> >> In race condition between kqueue_scan() and knote(), >> >> kn->kn_fop->f_event() from knote() may not be called, I think. >> > >> > Considering that both are called w/ a lock, that cannot happen.. >> > KN_LIST_LOCK(kn) locks the same lock that is asserted that is held >> > by knote... >> > >> >> In knote(), because the context holds knlist's lock, the context can >> >> not sleep. So, KN_INFLUX should not be set on calling >> >> kn->kn_fop->f_event() in kqueue_scan(), I think. >> > >> > No, it needs to be set: >> > * Setting the KN_INFLUX flag enables you to unlock the kq that this knote >> > * is on, and modify kn_status as if you had the KQ lock. >> > >> > As this comment says, _INFLUX allows you to unlock the KQ w/o fear that >> > the knote will disappear out from under you causing you to dereference >> > possibly free'd memory.. >> > >> > If you just tried to lock the list lock w/o unlocking the KQ lock, you >> > could end up w/ a dead lock, as you aren't maintaining lock order >> > properly.. The correct lock order if knlist -> kq... >> > >> >> What do you think about this issue? >> > >> > This is a real issue, but not due to the race you described >> > above... >> >> I beleave it's the result of the race. >> >> Could you try to add printf() in knote()? >> Please refer to attached patch. >> >> >> > I have verified on my machine that it isn't because there is a knote >> > waiting that isn't getting woken up, and the knote on my hung process >> > has data == 0, so it definately lost one of the signals: >> > (kgdb) print $14.kq_knhash[20].slh_first[0] >> > $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0}, >> > kn_knlist = 0xfffff8005a9c5840, kn_tqe = {tqe_next = 0xfffff801fdab4500, >> > tqe_prev = 0xfffff8004bb10038}, kn_kq = 0xfffff8004bb10000, kn_kevent = { >> > ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0}, >> > kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = { >> > p_fp = 0xfffff8005a9c54b8, p_proc = 0xfffff8005a9c54b8, >> > p_aio = 0xfffff8005a9c54b8, p_lio = 0xfffff8005a9c54b8, >> > p_v = 0xfffff8005a9c54b8}, kn_fop = 0xffffffff81405ef0, kn_hook = 0x0, >> > kn_hookid = 0} >> > >> > If you want to find this yourself, you can run kgdb on a live system, >> > switch to the thread of the parent (info threads, thread XXX), and >> > do: >> > frame 7 >> > >> > (or a frame that has td, which is struct thread *), then: >> > print *(struct kqueue *)td->td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data >> > >> > This will give you the struct kqueue * of the parent, and then: >> > print $XX.kq_knhash[0]_at_63 >> > >> > to figure out where the knote is in the hash, and then you can print >> > it out yourself... >> > >> > I'm going to take a look at this a bit more later... I'm thinking of >> > using dtrace to collect the stacks where filt_signal is called, and >> > match them up... dtrace might even be able to get us the note's data >> > upon return helping to make sure things got tracked properly... >> > >> > Thanks for finding this bug! Hopefully we can find a solution to it.. >> > >> > -- >> > John-Mark Gurney Voice: +1 415 225 5579 >> > >> > "All that I will do, has been done, All that I have, has not." > >> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c >> index b3fb23d..7791447 100644 >> --- a/sys/kern/kern_event.c >> +++ b/sys/kern/kern_event.c >> _at__at_ -1868,6 +1868,8 _at__at_ knote(struct knlist *list, long hint, int lockflags) >> if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) { >> KQ_LOCK(kq); >> if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) { >> + if (hint & NOTE_SIGNAL) >> + printf("Aee2\n"); >> KQ_UNLOCK(kq); >> } else if ((lockflags & KNF_NOKQLOCK) != 0) { >> kn->kn_status |= KN_INFLUX; >> _at__at_ -1886,6 +1888,10 _at__at_ knote(struct knlist *list, long hint, int lockflags) >> KQ_UNLOCK(kq); >> } >> } >> + else { >> + if (hint & NOTE_SIGNAL) >> + printf("Aee1\n"); >> + } >> kq = NULL; >> } >> if ((lockflags & KNF_LISTLOCKED) == 0) > > I think you are right, there is a race between setting influx before the > knote lock is owned in the kqueue_scan(), and knote() itself. The issue > is that the influx from scan must not prevent knote() from waking up > the waiters. > > Patch below fixed your test case for me, also tools/regression/kqueue did > not noticed a breakage. I tried to describe the situation in the > comment in knote(). Also, I removed unlocked check for the KN_INFLUX > in knote, since it seems to be an optimization for rare case, and is > the race on its own. > > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c > index b3fb23d..380f1ff 100644 > --- a/sys/kern/kern_event.c > +++ b/sys/kern/kern_event.c > _at__at_ -474,7 +474,7 _at__at_ knote_fork(struct knlist *list, int pid) > continue; > kq = kn->kn_kq; > KQ_LOCK(kq); > - if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) { > + if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { > KQ_UNLOCK(kq); > continue; > } > _at__at_ -1506,7 +1506,7 _at__at_ retry: > KQ_LOCK(kq); > kn = NULL; > } else { > - kn->kn_status |= KN_INFLUX; > + kn->kn_status |= KN_INFLUX | KN_SCAN; > KQ_UNLOCK(kq); > if ((kn->kn_status & KN_KQUEUE) == KN_KQUEUE) > KQ_GLOBAL_LOCK(&kq_global, haskqglobal); > _at__at_ -1515,7 +1515,8 _at__at_ retry: > KQ_LOCK(kq); > KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal); > kn->kn_status &= > - ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX); > + ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX | > + KN_SCAN); > kq->kq_count--; > KN_LIST_UNLOCK(kn); > influx = 1; > _at__at_ -1545,7 +1546,7 _at__at_ retry: > } else > TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); > > - kn->kn_status &= ~(KN_INFLUX); > + kn->kn_status &= ~(KN_INFLUX | KN_SCAN); > KN_LIST_UNLOCK(kn); > influx = 1; > } > _at__at_ -1865,28 +1866,33 _at__at_ knote(struct knlist *list, long hint, int lockflags) > */ > SLIST_FOREACH(kn, &list->kl_list, kn_selnext) { > kq = kn->kn_kq; > - if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) { > + KQ_LOCK(kq); > + if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) { > + /* > + * Do not process the influx notes, except for > + * the influx coming from the kq unlock in the > + * kqueue_scan(). In the later case, we do > + * not interfere with the scan, since the code > + * fragment in kqueue_scan() locks the knlist, > + * and cannot proceed until we finished. > + */ > + KQ_UNLOCK(kq); > + } else if ((lockflags & KNF_NOKQLOCK) != 0) { > + kn->kn_status |= KN_INFLUX; > + KQ_UNLOCK(kq); > + error = kn->kn_fop->f_event(kn, hint); > KQ_LOCK(kq); > - if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) { > - KQ_UNLOCK(kq); > - } else if ((lockflags & KNF_NOKQLOCK) != 0) { > - kn->kn_status |= KN_INFLUX; > - KQ_UNLOCK(kq); > - error = kn->kn_fop->f_event(kn, hint); > - KQ_LOCK(kq); > - kn->kn_status &= ~KN_INFLUX; > - if (error) > - KNOTE_ACTIVATE(kn, 1); > - KQ_UNLOCK_FLUX(kq); > - } else { > - kn->kn_status |= KN_HASKQLOCK; > - if (kn->kn_fop->f_event(kn, hint)) > - KNOTE_ACTIVATE(kn, 1); > - kn->kn_status &= ~KN_HASKQLOCK; > - KQ_UNLOCK(kq); > - } > + kn->kn_status &= ~KN_INFLUX; > + if (error) > + KNOTE_ACTIVATE(kn, 1); > + KQ_UNLOCK_FLUX(kq); > + } else { > + kn->kn_status |= KN_HASKQLOCK; > + if (kn->kn_fop->f_event(kn, hint)) > + KNOTE_ACTIVATE(kn, 1); > + kn->kn_status &= ~KN_HASKQLOCK; > + KQ_UNLOCK(kq); > } > - kq = NULL; > } > if ((lockflags & KNF_LISTLOCKED) == 0) > list->kl_unlock(list->kl_lockarg); > diff --git a/sys/sys/event.h b/sys/sys/event.h > index bad8c9e..3b765c0 100644 > --- a/sys/sys/event.h > +++ b/sys/sys/event.h > _at__at_ -207,6 +207,7 _at__at_ struct knote { > #define KN_MARKER 0x20 /* ignore this knote */ > #define KN_KQUEUE 0x40 /* this knote belongs to a kq */ > #define KN_HASKQLOCK 0x80 /* for _inevent */ > +#define KN_SCAN 0x100 /* flux set in kqueue_scan() */ > int kn_sfflags; /* saved filter flags */ > intptr_t kn_sdata; /* saved data field */ > union { Hi, I tried Konstantin's patch, too. This patch is OK, I think. Many thanks, Kohji OkunoReceived on Wed Apr 02 2014 - 10:30:05 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:48 UTC