Re: [BUG] I think sleepqueue need to be protected in sleepq_broadcast

From: John Baldwin <jhb_at_freebsd.org>
Date: Mon, 8 Sep 2008 15:56:02 -0400
On Tuesday 02 September 2008 09:40:49 pm Peter Wemm wrote:
> On Tue, Sep 2, 2008 at 1:08 PM, John Baldwin <jhb_at_freebsd.org> wrote:
> > On Sunday 31 August 2008 09:31:17 pm Tor Egge wrote:
> >> sleepq_resume_thread() contains an ownership handover of sq if the
> >> resumed thread is the last one blocked on the wait channel.  After the
> >> handover, sq
> >
> > is
> >
> >> no longer protected by the sleep queue chain lock and should no longer
> >> be accessed by sleepq_broadcast().
> >>
> >> Normally, when sleepq_broadcast() incorrectly accesses sq after the
> >
> > handover,
> >
> >> it will find the sq->sq_blocked queue to be empty, and the code appears
> >> to work.
> >>
> >> If the last correctly woken thread manages to go to sleep again very
> >> quickly
> >
> > on
> >
> >> another wait channel, sleepq_broadcast() might incorrectly determine
> >> that
> >
> > the
> >
> >> sq->sq_blocked queue isn't empty, and start doing the wrong thing.
> >
> > So disregard my earlier e-mail.  Here is a simple fix for the sleepq
> > case:
> >
> > Index: subr_sleepqueue.c
> > ===================================================================
> > --- subr_sleepqueue.c   (revision 182679)
> > +++ subr_sleepqueue.c   (working copy)
> > _at__at_ -779,7 +779,7 _at__at_
> >  sleepq_broadcast(void *wchan, int flags, int pri, int queue)
> >  {
> >        struct sleepqueue *sq;
> > -       struct thread *td;
> > +       struct thread *td, *tdn;
> >        int wakeup_swapper;
> >
> >        CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags);
> > _at__at_ -793,8 +793,7 _at__at_
> >
> >        /* Resume all blocked threads on the sleep queue. */
> >        wakeup_swapper = 0;
> > -       while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) {
> > -               td = TAILQ_FIRST(&sq->sq_blocked[queue]);
> > +       TAILQ_FOREACH_SAFE(td, &sq->sq_blocked[queue], td_slpq, tdn) {
> >                thread_lock(td);
> >                if (sleepq_resume_thread(sq, td, pri))
> >                        wakeup_swapper = 1;
> >
> > This only uses 'sq' to fetch the head of the queue once up front.  It
> > won't use it again once it has started waking up threads.
>
> I don't know if it is the same problem, but mx2.freebsd.org, running
> today's 6.4-PRERELEASE just died with:
> Sep  3 00:20:11 mx2 sshd[15333]: fatal: Read from socket failed: Connection
> resr panic: Assertion td->td_flags & TDF_SINTR failed at
> ../../../kern/subr_sleepque5 cpuid = 2
> KDB: enter: panic
> FreeBSD 6.4-PRERELEASE #7: Tue Sep  2 19:43:27 UTC 2008
> This was after about 3 hours of uptime.  It has previously run happily
> for months at a time before today's rebuild.

So I think what happened is that the thread was woken up while the sleepq 
chain was unlocked while the thread unlocks the sx lock.  The code handles 
this fine already since the same race can happen when dropping the lock while 
checking for signals.  However, in this case TDF_SINTR won't be true anymore.  
The assertion just needs to be updated.  Try this:

Index: subr_sleepqueue.c
===================================================================
--- subr_sleepqueue.c	(revision 182874)
+++ subr_sleepqueue.c	(working copy)
_at__at_ -382,7 +382,7 _at__at_
 	CTR3(KTR_PROC, "sleepq catching signals: thread %p (pid %ld, %s)",
 		(void *)td, (long)p->p_pid, p->p_comm);
 
-	MPASS(td->td_flags & TDF_SINTR);
+	MPASS((td->td_sleepqueue != NULL) ^ (td->td_flags & TDF_SINTR));
 	mtx_unlock_spin(&sc->sc_lock);
 
 	/* See if there are any pending signals for this thread. */


-- 
John Baldwin
Received on Mon Sep 08 2008 - 18:13:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:35 UTC