On Monday 01 March 2004 03:14 pm, Danny J. Zerkel wrote: > Looks like John's sys/kern/kern_thread.c 1.171 broke Mozilla (and > threading in general). He appears to have left out a test that > the sleeping thread is interruptable in one spot: > > --- kern_thread.c.orig Mon Mar 1 15:08:01 2004 > +++ kern_thread.c Mon Mar 1 14:26:30 2004 > _at__at_ -646,7 +646,8 _at__at_ > } else if (TD_ON_SLEEPQ(td2) && > ((td2->td_wchan == &kg->kg_completed) || > (td2->td_wchan == &p->p_siglist && > - (ku->ku_mflags & KMF_WAITSIGEVENT)))) { > + (ku->ku_mflags & KMF_WAITSIGEVENT))) && > + (td2->td_flags & TDF_SINTR)) { > sleepq_abort(td2); > } else { > ku->ku_flags |= KUF_DOUPCALL; > > This change fixes my Mozilla hangs and other panics on current. This should not matter. This is the only place that doesn't check TDF_SINTR, true, but it should always be set in these cases. First, look in kern_thread.c where we msleep() on these addresses (p_siglist and kg_completed): if (!(p->p_flag & P_SIGEVENT) && !(ku->ku_flags & KUF_DOUPCALL)) error = msleep(&p->p_siglist, &p->p_mtx, PPAUSE| PCATCH, "ksesigwait", (uap->timeout ? tvtohz(&tv) : 0)); p->p_flag &= ~P_SIGEVENT; Note that PCATCH is set. error = msleep(&kg->kg_completed, &p->p_mtx, PPAUSE|PCATCH, "kserel", (uap->timeout ? tvtohz(&tv) : 0)); kg->kg_upsleeps--; Again, PCATCH is set. Now onto kern_synch.c and msleep(): catch = priority & PCATCH; ... sq = sleepq_lookup(ident); ... sleepq_add(sq, ident, mtx, wmesg, 0); if (timo) sleepq_set_timeout(sq, ident, timo); if (catch) { sig = sleepq_catch_signals(ident); if (sig == 0 && !TD_ON_SLEEPQ(td)) { mtx_lock_spin(&sched_lock); td->td_flags &= ~TDF_SINTR; mtx_unlock_spin(&sched_lock); catch = 0; } } else sig = 0; ... sleepq_wait_foo(ident); Thus, we see that if PCATCH is set, we call sleepq_add() and then sleepq_catch_signals(). Bah, I see. Only the sleep queue lock is atomic for the whole operation and not sched_lock. While your patch might fix the assertion for this KSE case, there might be a larger set or problems with the race here, which is that sched_lock can be dropped in between td_wchan being set and TDF_SINTR being set. I.e, if you get an abort in between the sleepq_add() and sleepq_catch_signals() and just use sched_lock it will get lost when it shouldn't. For signals, I think the call to cursig() will still work ok. I'm not sure if sleepq_catch_signals() calls anything that checks for KUF_DOUPCALL. Let me check. Ugh, this code is too hard to follow. I have no idea if this is actually safe or not. :( I can commit the above as a band-aid I guess. I don't know why the KSE code doesn't just use a wakeup here, the sleeps are in top-level functions and not buried deep. -- John Baldwin <jhb_at_FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.orgReceived on Mon Mar 01 2004 - 14:09:38 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:45 UTC