Re: panic: uma_zone_slab is looping

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Tue, 28 Dec 2004 15:38:15 -0500
On Sunday 26 December 2004 08:37 pm, Bosko Milekic wrote:
> On Sun, Dec 26, 2004 at 11:56:51PM +0100, Peter Holm wrote:
> > On Sun, Dec 26, 2004 at 01:17:38PM -0500, Bosko Milekic wrote:
> > > On Sun, Dec 26, 2004 at 05:11:53PM +0100, Peter Holm wrote:
> > > > Yes, I think that I have verified your exelent analysis of the
> > > > problem: http://www.holm.cc/stress/log/freeze04.html
> > > >
> > > > So, do have any fix suggenstons? :-)
> > >
> > >   Not yet, because the problem is non-obvious from the trace.
> > >
> > >   I need to know exactly when the UMA RCntSlabs zone recurses _first_,
> > >   and I need to confirm that it is an actual recursion.  I've looked at
> > >   the VM code and I don't see how/why recursion on the RCntSlabs zone
> > >   would happen.
> > >
> > >   Please modify the printf code to look exactly like this:
> > >
> > >    if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) {
> > > 	if ((zone == slabzone) || (zone == slabrefzone))
> > > 		panic("Zone %s forced to fail due to recurse non-null: %d\n",
> > > 		    zone->uz_name, keg->uk_recurse);
> > >    	return (NULL);
> > >    }
> > >
> > >   (You don't need to check any global counter -- the counter is
> > > imperfect anyway -- because even a single recursion on slabzone or
> > > slabrefzone should be illegal).
> > >
> > >   I'd like to see the trace from the above panic, if possible.
> >
> > Here it is: http://www.holm.cc/stress/log/freeze05.html
>
>   I have checked the code here and looked at possible code paths and
>   have unfortunately resorted to reguessing, and now I believe I have
>   identified a problematic scenario.
>
>   Consider this particular timeline (time moves downward):
>   [I hope you can handle ASCII art]
>
>   By the way, the stack trace you show would correspond to that of
>   thread 2. I refer to a frame number below.
>
>   thread 1 (t1)                      thread 2 (t2)
> -------------------------------------------------------------------------
>
>   t1.a) Allocating from a zone,
>   needs slab header from one of
>   the slab header zones (either
>   slabzone or slabrefzone). Let's
>   assume it is slabzone, as in
>   your trace above. The allocation
>   is performed with M_WAITOK.
>
>                                      t2.a) Needs to allocate from
> 				     a zone, and it needs a
> 				     slab header too.  The allocation will
> 				     be performed with M_WAITOK.  Let's
> 				     assume that the slab header zone
> 				     we're allocating is also slabzone.
>
>   t1.b) in uma_zone_slab(), has
>   slabzone's keg lock, increments
>   keg's uk_recurse.
>   Enters slab_zalloc().
>
>                                     t2.b) Blocks on zone lock.
>
>   t1.c) Drops zone lock to
>   allocate from VM, uk_recurse
>   for the slabzone is currently
>   1 (we incremented it in t1.b).
>
>                                     t2.c) Takes zone lock for slabzone,
> 				    now in uma_zone_slab() (Frame 11),
> 				    and since uk_recurse is 1, it
> 				    decides recursion happened.  Wants
> 				    to return NULL even though
> 				    allocation was done with M_WAITOK.
> 				    Our panic is triggered.
>
>   I'll have to reserve some more time to think about this.  One way I
>   think it might be solvable would be to change that check that
>   triggers the NULL return explicitly check for the bucketzone, and not
>   for all UMA_ZONE_INTERNAL zones; I need to think this through a little
>   more.
>
>   Does the scenario seem likely to you?

This is what I wondered about earlier in the thread.  The problem is that 
recursion needs to be a per-allocation (or per-thread) state, not per-zone, 
since a zone may be used by multiple threads at the same time.  Is it really 
desired behavior of the bucket zone that if two threads alloc at the same 
time one gets NULL because it sees the other's use and thinks it is 
recursing?

-- 
John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
Received on Tue Dec 28 2004 - 20:50:51 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:25 UTC