Re: Assertion td->td_sleepqueue != NULL failed at kern/subr_sleepqueue.c:270

From: Peter Holm <peter_at_holm.cc>
Date: Tue, 8 Feb 2005 21:24:33 +0100
On Tue, Feb 08, 2005 at 02:40:49PM -0500, John Baldwin wrote:
> On Thursday 06 January 2005 04:45 pm, Peter Holm wrote:
> > On Thu, Jan 06, 2005 at 04:17:49PM -0500, John Baldwin wrote:
> > > On Wednesday 05 January 2005 07:26 am, Peter Holm wrote:
> > > > With GENERIC HEAD from Dec 31 09:28 UTC + bmilekic_at_'s uma_core
> > > > patch + alc's patch I got the following strange assert:
> > > >
> > > > panic(c0827c46,c082dd18,c082dc8d,10e,c08f4660) at panic+0x190
> > > > sleepq_add(c08eec90,c08ee6e8,c082a9bf,1,c08ee6e8,0,c0827ca9,7d)
> > > >    at sleepq_add+0x156
> > > > cv_wait(c08eec90,c08ee6e8,c151de30,0,ffffffff) at cv_wait+0x100
> > > > _sx_xlock(c08eec60,c0828867,247,0,c151ddc8) at _sx_xlock+0x59
> > > > kern_wait(c151e450,ffffffff,cbc67c90,0,0) at kern_wait+0x4b
> > > > wait4(c151e450,cbc67d14,4,3f8,282) at wait4+0x29
> > > > syscall(2f,2f,bfbf002f,2,0) at syscall+0x128
> > > > Xint0x80_syscall() at Xint0x80_syscall+0x1f
> > > > --- syscall (7, FreeBSD ELF32, wait4), eip = 0x805170b, esp =
> > > > 0xbfbfedbc, ebp = 0xbfbfedd8 ---
> > > >
> > > > Looks like td->td_sleepqueue is NULL!
> > > >
> > > > Details at http://www.holm.cc/stress/log/cons100.html
> > >
> > > This is a truly odd panic.  The basic theory of operation with sleep
> > > queues is that every thread that is not already queued on a sleep queue
> > > carries a sleep queue structure around that they donate to a wait channel
> > > when they block on it.  Once they are resumed, they reclaim a sleep queue
> > > from the waitchannel. This resuming bit happens in sleepq_remove_thread()
> > > in subr_sleepqueue.c.  As you can see, in addition to assigning a
> > > sleepqueue to the thread being removed from a queue, it also clears
> > > td_wchan and td_wmesg.  The thread in question has both fields set (as if
> > > it were asleep on "proctree", which is what it is trying to back to sleep
> > > on now).  However, it is not on a sleep queue (td_slpq.tqe_next is NULL).
> > >  So, apparently, it seems that a thread was removed from the sleep queue
> > > and resumed (made runnable) but
> > > sleepq_remove_thread() wasn't called.  Do you have any local patches that
> > > might affect this btw?  I notice you get a lot of trap 9's in your dmesg
> > > which is somewhat unsettling.
> >
> > These are the modifications:
> > http://www.holm.cc/stress/log/mods.html
> >
> > The trap 9 are not uncommon for the test suite.
> 
> I'm still thinking about this FYI as I've seen this at least once or twice, 
> but I still don't understand how it is happend.  In the other case I've 
> looked at, it is as if the thread has been awakened by someone outside of the 
> sleep queue code because td_wchan and td_wmesg are still set 
> (sleepq_remove_thread() clears them) and the associated wait channel 
> (proctree, which is another common theme) has a sleep queue with no waiters 
> attached to it.  That is, the sleep queue that curthread should have is still 
> sitting on a sleep queue chain, which is consistent with the thread being 
> made runnable without going through sleepq_remove_thread().  Are you able to 
> reproduce this at all?  If so, can you do it with KTR enabled and KTR_PROC 
> tracing turned on perhaps?  Thanks.
> 

No, I have only seen this problem once. But if I succeed in
provoking this problem again I will try out with KTR. Thank you
for your reply.

> -- 
> John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

-- 
Peter Holm
Received on Tue Feb 08 2005 - 19:24:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:27 UTC