2010/7/1 Bryan Venteicher <bryanv_at_daemoninthecloset.org>: > On a recent -current, I got the following panic from deadlkres: > > Assertion wchan != NULL failed at /usr/src-nfs/sys/kern/subr_sleepqueue.c:680 > > Tracing pid 0 tid 100058 td 0xffffff00024bf7a0 > kdb_enter() at kdb_enter+0x3d > panic() at panic+0x176 > sleepq_type() at sleepq_type+0x56 > deadlkres() at deadlkres+0x224 > fork_exit() at fork_exit+0x12a > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff8074976d30, rbp = 0 --- > (Hand transcribed, doadump() hung) > > deadlkres() came across a TD_IS_SLEEPING()'ing thread that was not a > sleepqueue (ie, td->td_wchan == NULL). > > I don't think this is an invalid state for thread to be in: After adding itself > to a sleepq and setting a timeout, the thread calls sleepq_timedwait_sig(). > sleepq_catch_signals() determines there is a signal pending so it removes the > thread from the sleepq via sleepq_resume_thread(). Returning to > sleepq_timedwait_sig(), in the call to sleepq_check_timeout(), the thread is > unable to cancel the timeout because it is already firing (likely waiting on > thread_lock()). So the thread calls TD_SET_SLEEPING() followed by mi_switch(). > deadlkres() then picks up thread_lock(), finding td is TD_IS_SLEEPING() && > !TD_ON_SLEEPQ(). > > The attached patch takes care of the panic for me. I think that your analysis and patch are both fine and are committed, along with a small cleanup, as r209761. Thanks, Attilio -- Peace can only be achieved by understanding - A. EinsteinReceived on Wed Jul 07 2010 - 10:01:14 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:05 UTC