John Baldwin wrote: > On Sunday 31 August 2008 09:31:17 pm Tor Egge wrote: > >> sleepq_resume_thread() contains an ownership handover of sq if the resumed >> thread is the last one blocked on the wait channel. After the handover, sq >> > is > >> no longer protected by the sleep queue chain lock and should no longer be >> accessed by sleepq_broadcast(). >> >> Normally, when sleepq_broadcast() incorrectly accesses sq after the >> > handover, > >> it will find the sq->sq_blocked queue to be empty, and the code appears to >> work. >> >> If the last correctly woken thread manages to go to sleep again very quickly >> > on > >> another wait channel, sleepq_broadcast() might incorrectly determine that >> > the > >> sq->sq_blocked queue isn't empty, and start doing the wrong thing. >> > > So disregard my earlier e-mail. Here is a simple fix for the sleepq case: > > Index: subr_sleepqueue.c > =================================================================== > --- subr_sleepqueue.c (revision 182679) > +++ subr_sleepqueue.c (working copy) > _at__at_ -779,7 +779,7 _at__at_ > sleepq_broadcast(void *wchan, int flags, int pri, int queue) > { > struct sleepqueue *sq; > - struct thread *td; > + struct thread *td, *tdn; > int wakeup_swapper; > > CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags); > _at__at_ -793,8 +793,7 _at__at_ > > /* Resume all blocked threads on the sleep queue. */ > wakeup_swapper = 0; > - while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) { > - td = TAILQ_FIRST(&sq->sq_blocked[queue]); > + TAILQ_FOREACH_SAFE(td, &sq->sq_blocked[queue], td_slpq, tdn) { > thread_lock(td); > if (sleepq_resume_thread(sq, td, pri)) > wakeup_swapper = 1; > > This only uses 'sq' to fetch the head of the queue once up front. It won't > use it again once it has started waking up threads. > > >> A similar (but probably much more difficult to trigger) issue is present >> > with > >> regards to thread_lock() and turnstiles. >> >> The caller of thread_lock() might have performed sufficient locking to >> > ensure > >> that the thread to be locked doesn't go away, but any turnstile spin lock >> pointed to by td->td_lock isn't protected. Making turnstiles type stable >> (setting UMA_ZONE_NOFREE flag for turnstile_zone) should fix that issue. >> > > Note that unlike the sleepq case, turnstiles are not made runnable until all > of them are dequeued from the turnstile and assigned a new turnstile. Only > after all that is settled are the threads made runnable in turnstile_unpend(). > > However, that doesn't fix this specific race (though it means the turnstile > code is not subject to the same exact race as the sleepq code above). Making > turnstiles type-stable is indeed probably the only fix for this. :-/ > > I can confirm this patch fixes the repeatable panic which was causing: panic: mutex sleepq chain (0xffffffff8102c460) not owned .... type messages for me when DDB was enabled, deadlocks without it. It's the first time I've been able run a complete zpool scrub in multiuser mode in since 7.0 was released. Tor, nice catch! Jb, thanks for the fix. This also I believe solves amd64/124200 Cheers, BenjaminReceived on Thu Sep 04 2008 - 10:50:19 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC