Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Sun, 18 Feb 2018 23:25:13 +0100
On Sun, Feb 18, 2018 at 10:50 PM, Mark Millard <marklmi26-fbsd_at_yahoo.com>
wrote:

>
>
> On 2018-Feb-18, at 1:46 PM, Mark Millard <marklmi26-fbsd_at_yahoo.com> wrote:
>
> > On 2018-Feb-18, at 1:33 PM, Mateusz Guzik <mjguzik_at_gmail.com> wrote:
> >
> >> On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
> >> Trond.Endrestol_at_fagskolen.gjovik.no> wrote:
> >>
> >>> On Sun, 18 Feb 2018 11:51-0800, Mark Millard wrote:
> >>>
> >>>> Note: -r329448 was reverted in -r329461 : racy.
> >>>
> >>> True. I got a crash when compiling r329451 while running r329449.
> >>> I've now booted the r329422 ZFS BE and I'm attempting to build
> >>> r329529.
> >>>
> >>
> >> Looking around strongly suggests r329448 is the culprit. If you can
> verify
> >> 329447 works fine we are mostly done here.
> >>
> >> Note the revision got reverted and different variant got in in r329531.
> >>
> >> That said, if r329447 works then the issue should be already fixed and
> in
> >> particular fresh head should work fine.
> >
> > My initial problem was with -r329465, which is after -r329461 reverted
> > -r329488 . Trond reported in one note that he had problems with
> > -r329464 , also after -r329488 was reverted. Trond has also reported
> > -r329449 failed.
>
> Dumb typos above: I meant -r329448 instead of -r329488 both times.
>
>
Ok, I think I see the bug:

exit1 does:
        PROC_SLOCK(p);
        p->p_state = PRS_ZOMBIE;
/* work continues */

pre-patch proc_to_reap does an equivalent of:
       if (p->p_state == PRS_ZOMBIE) {
                PROC_SLOCK(p);
                PROC_SUNLOCK(p);
                .... reap;
      }

It is possible the exiting thread will be caught just after setting the
state to PRS_ZOMBIE.

With the slock/sunlock cycle we guarantee the reaping thread will
wait for it to finish.

Without the cycle we can end up reaping the still exiting thread.

I'll fix it soon(tm).

-- 
Mateusz Guzik <mjguzik gmail.com>
Received on Sun Feb 18 2018 - 21:25:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC