Re: panic caused by EVFILT_SIGNAL detaching in rfork()ed thread

From: John-Mark Gurney <gurney_j_at_resnet.uoregon.edu>
Date: Wed, 1 Sep 2004 10:37:57 -0700
Igor Sysoev wrote this message on Wed, Sep 01, 2004 at 20:17 +0400:
> On Wed, 1 Sep 2004, John-Mark Gurney wrote:
> 
> > Igor Sysoev wrote this message on Wed, Sep 01, 2004 at 14:47 +0400:
> > > #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0)
> > >     at /usr/src/sys/kern/kern_event.c:1527
> > > 1527            knlist_remove_kq(knl, kn, islocked, 0);
> > > (kgdb) p *knl
> > > $1 = {kl_lock = 0x0, kl_list = {slh_first = 0x0}}
> > >
> > >
> > > However, I do not know is it safe to test !SLIST_EMPTY(&p->p_klist) in
> >
> > It is possible to call SLIST_EMPTY, but you need to make you have proper
> > locks held between the time you call SLIST_EMPTY, and knlist_remove...
> > But I don't think that's the problem, the problem is else where...
> 
> The problem is to test the empty p->p_klist before the calling
> knlist_remove().

Nope..  The problem is that I was not testing the KN_DETACHED flag..
I haven't verified this fix for other processes, but I believe this
should fix the problem...

The KN_DETACHED flag implies if it is on the knlist of the object
or not.. if it is not set, then the object is on the object, and needs
to be removed...  This fix should also mean that we can get rid of
the test for KN_DETACHED in filt_procdetach...  (which wasn't in
filt_sigdetach)..

Another "proper" fix to this panic, would be to add the
if (kn->kn_status & KN_DETACHED)
	return;
check to filt_sigdetach...

There can be a situation where the list is not empty, but the knote is
not on the list (already detached), and then we'd get another panic by
falling off the end of the list...

> > > filt_sigdetach() because in 5.3-BETA2 kqueue uses own mutex. Unfortunately,
> > > I could not just now to write a small test case to allow everyone to
> > > reproduce the panic but my user-level server always causes panic on exit on
> > > unpatched 5.x and sometimes on unpatched 4.10.
> >
> > Could you print *kn?
> 
> (kgdb) fr 14
> #14 0xc05e87b3 in knlist_remove (knl=0xc39724f4, kn=0xc3e1d154, islocked=0)
>     at /usr/src/sys/kern/kern_event.c:1527
> 1527            knlist_remove_kq(knl, kn, islocked, 0);
> (kgdb) p *kn
> $1 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0},
>   kn_knlist = 0x0, kn_tqe = {tqe_next = 0x0, tqe_prev = 0xc3c780ac},
>   kn_kq = 0xc3c78080, kn_kevent = {ident = 64, filter = -6, flags = 32817,
>     fflags = 0, data = 1, udata = 0x0}, kn_status = 27, kn_sfflags = 0,

27 == 0x1b == 0b11011: KN_ACTIVE|KN_QUEUED|KN_DETACHED|KN_INFLUX

[...]

> > The problem is some how that the knote is being removed from the list
> > (or was never on the list), but not being marked detached...
> >
> > Hmmm. what are the options you are using for rfork?
> 
> The worker process starts two worker threads created by
> rfork(RFPROC|RFTHREAD|RFMEM). Each thread opens kqueue and
> adds the EVFILT_SIGNAL event.
> 
> If you like I can send to you the source tarball (I do not distribute
> the server right now, because it has not the documentation). The build
> process is simple. Then you need to press ^C and you will get the panic.

I believe this panic may be possible w/o rfork, but I'm not possitive..
It's probably an artifact of the fact that the kq was living longer
than the proc that had the signal kevent associated with it, which
normally does not happen...

Attached is a patch..  And let me know if it fixes your panic...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

Received on Wed Sep 01 2004 - 15:38:01 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:10 UTC