Re: [BUG] I think sleepqueue need to be protected in sleepq_broadcast

From: Peter Wemm <peter_at_wemm.org>
Date: Tue, 2 Sep 2008 18:40:49 -0700
On Tue, Sep 2, 2008 at 1:08 PM, John Baldwin <jhb_at_freebsd.org> wrote:
> On Sunday 31 August 2008 09:31:17 pm Tor Egge wrote:
>>
>> sleepq_resume_thread() contains an ownership handover of sq if the resumed
>> thread is the last one blocked on the wait channel.  After the handover, sq
> is
>> no longer protected by the sleep queue chain lock and should no longer be
>> accessed by sleepq_broadcast().
>>
>> Normally, when sleepq_broadcast() incorrectly accesses sq after the
> handover,
>> it will find the sq->sq_blocked queue to be empty, and the code appears to
>> work.
>>
>> If the last correctly woken thread manages to go to sleep again very quickly
> on
>> another wait channel, sleepq_broadcast() might incorrectly determine that
> the
>> sq->sq_blocked queue isn't empty, and start doing the wrong thing.
>
> So disregard my earlier e-mail.  Here is a simple fix for the sleepq case:
>
> Index: subr_sleepqueue.c
> ===================================================================
> --- subr_sleepqueue.c   (revision 182679)
> +++ subr_sleepqueue.c   (working copy)
> _at__at_ -779,7 +779,7 _at__at_
>  sleepq_broadcast(void *wchan, int flags, int pri, int queue)
>  {
>        struct sleepqueue *sq;
> -       struct thread *td;
> +       struct thread *td, *tdn;
>        int wakeup_swapper;
>
>        CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags);
> _at__at_ -793,8 +793,7 _at__at_
>
>        /* Resume all blocked threads on the sleep queue. */
>        wakeup_swapper = 0;
> -       while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) {
> -               td = TAILQ_FIRST(&sq->sq_blocked[queue]);
> +       TAILQ_FOREACH_SAFE(td, &sq->sq_blocked[queue], td_slpq, tdn) {
>                thread_lock(td);
>                if (sleepq_resume_thread(sq, td, pri))
>                        wakeup_swapper = 1;
>
> This only uses 'sq' to fetch the head of the queue once up front.  It won't
> use it again once it has started waking up threads.

I don't know if it is the same problem, but mx2.freebsd.org, running
today's 6.4-PRERELEASE just died with:
Sep  3 00:20:11 mx2 sshd[15333]: fatal: Read from socket failed: Connection resr
panic: Assertion td->td_flags & TDF_SINTR failed at ../../../kern/subr_sleepque5
cpuid = 2
KDB: enter: panic
FreeBSD 6.4-PRERELEASE #7: Tue Sep  2 19:43:27 UTC 2008
This was after about 3 hours of uptime.  It has previously run happily
for months at a time before today's rebuild.

-- 
Peter Wemm - peter_at_wemm.org; peter_at_FreeBSD.org; peter_at_yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell
Received on Tue Sep 02 2008 - 23:40:51 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC