Re: panic: uma_zone_slab is looping

From: Bosko Milekic <bmilekic_at_technokratis.com>
Date: Wed, 22 Dec 2004 17:15:40 -0500
On Wed, Dec 22, 2004 at 10:05:53PM +0100, Peter Holm wrote:
> On Mon, Dec 20, 2004 at 06:41:04PM -0500, Bosko Milekic wrote:
> > 
> >   I realize it's been a while.
> > 
> >   Anyway, what I *think* is going on here is that slab_zalloc() is
> >   actually returning NULL even when called with M_WAITOK.  Further
> >   inspection in slab_zalloc() reveals that this could come from several
> >   places.  One of them is kmem_malloc() itself, which I doubt will ever
> >   return NULL if called with M_WAITOK.  If this assumption is indeed
> >   correct, then the NULL must be being returned by slab_zalloc() itself,
> >   or due to a failed uma_zalloc_internal() call.  It is also possible
> >   that slab_zalloc() returns NULL if the init that gets called for the
> >   zone fails.  However, judging from the stack trace you provided, the
> >   init in question is mb_init_pack() (kern_mbuf.c).  This particular
> >   init DOES perform an allocation and CAN in theory fail, but I believe
> >   it should be called with M_WAITOK as well, and so it should also never
> >   fail in theory.
> > 
> >   Have you gotten any further with the analysis of this particular
> >   trace?  If not, I would suggest adding some more printf()s and
> >   analysis into slab_zalloc() itself, to see if that is indeed what is
> >   causing the infinite looping in uma_zone_slab() and, if so, attempt to
> >   figure out what part of slab_zalloc() is returning the NULL.
> 
> OK, did that: http://www.holm.cc/stress/log/freeze03.html

  OK, well, I think I know what's happening.  See if you can confirm
  this with me.

  I'll start with your trace and describe the analysis, please bear with
  me because it's long and painful.

  Your trace indicates that the NULL allocation failure, despite a call
  with M_WAITOK, is coming from slab_zalloc().  The particular thing
  that should also be mentionned about this trace, and your previous
  one, is that they both show a call path that goes through an init
  which performs an allocation, also with M_WAITOK.  Currently, only the
  "packet zone" does this.  It looks something like this:

  1. UMA allocation is performed for a "packet."  A "packet" is an mbuf
     with a pre-attached cluster.

  2. UMA dips into the packet zone and finds it empty.  Additionally, it
     determines that it is unable to get a bucket to fill up the zone
     (presumably there is a lot of memory request load).  So it calls
     uma_zalloc_internal on the packet zone (frame 18).

  3. Perhaps after some blocking, a slab is obtained from the packet
     zone's backing keg (which coincidentally is the same keg as the
     mbuf zone's backing keg -- let's call it the MBUF KEG).  So now
     that an mbuf item is taken from the freshly allocated slab obtained
     from the MBUF KEG, uma_zalloc_internal() needs to init and ctor it,
     since it is about to return it to the top (calling) layer.  It
     calls the initializer on it for the packet zone, mbuf_init_pack().
     This corresponds to frame 17.

  4. The packet zone's initializer needs to call into UMA again to get
     and attach an mbuf cluster to the mbuf being allocated, because mbufs
     residing within the packet zone (or obtained from the packet zone)
     MUST have clusters attached to them.  It attempts to perform this
     allocation with M_WAITOK, because that's what the initial caller
     was calling with.  This is frame 16.

  5. Now the cluster zone is also completely empty and we can't get a
     bucket (surprise, surprise, the system is under high memory-request
     load). UMA calls uma_zalloc_internal() on the cluster zone as well.
     This is frame 15.

  6. uma_zalloc_internal() calls uma_zone_slab().  Its job is to find a
     slab from the cluster zone's backing keg (a separate CLUSTER KEG)
     and return it.  Unfortunately, memory-request load is high, so it's
     going to have a difficult time.  The uma_zone_slab() call is frame
     14.

  7. uma_zone_slab() can't find a locally cached slab (hardly
     surprising, due to load) and calls slab_zalloc() to actually go to
     VM and get one.  Before calling, it increments a special "recurse"
     flag so that we do not recurse on calling into the VM.  This is
     because the VM itself might call back into UMA when it attempts to
     allocate vm_map_entries which could cause it to recurse on
     allocating buckets.  This recurse flag is PER zone, and really only
     exists to protect the bucket zone. Crazy, crazy shit indeed.
     Pardon the language.  This is frame 13.

  8. Now slab_zalloc(), called for the CLUSTER zone, determines that the
     cluster zone (for space efficiency reasons) is in fact an OFFPAGE
     zone, so it needs to grab a slab header structure from a separate
     UMA "slab header" zone.  It calls uma_zalloc_internal() from
     slab_zalloc(), but it calls it on the SLAB HEADER zone.  It passes
     M_WAITOK down to it, but for some reason IT returns NULL and the
     failure is propagated back up which causes the uma_zone_slab() to
     keep looping.  Go back to step 7.

  This is the infinite loop 7 -> 8 -> 7 -> 8 -> ...  which you seem to
  have caught.

  The question now is why does the uma_zalloc_internal() fail on the
  SLAB HEADER zone, even though it is called with M_WAITOK.
  Unfortunately, your stack trace does not provide enough depth to be
  able to continue an accurate deductive analysis from this point on
  (you would need to sprinkle MORE KASSERTs).

  However, here are some hypotheses. 

  The uma_zalloc_internal() which ends up getting called also ends up
  calling uma_zone_slab(), but uma_zone_slab() eventually fails (this is
  a fact, this is the only reason that the uma_zalloc_internal() could
  in turn fail for the SLAB HEADER zone, which doesn't have an init or a
  ctor).

  So why does the uma_zone_slab() fail with M_WAITOK on the slab header
  zone?  Possibilities:

  1. The recurse flag is at some point determined non-zero FOR THE SLAB
     HEADER backing keg.  If the VM ends up getting called from the
     subsequent slab_zalloc() and ends up calling back into UMA for
     whatever allocations, and "whatever allocations" are also
     potentially offpage, and a slab header is ALSO required, then we
     could also be recursing on the slab header zone from VM, so this
     could cause the failure.

     if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) {
	 /* ADD PRINTF HERE */
	 printf("This zone: %s, forced fail due to recurse non-null",
	     zone->uz_name);
	 return NULL;
     }

     If you get the print to trigger right before the panic (last one
     before the panic), see if it is on the SLAB HEADER zone.  In
     theory, it should only happen for the BUCKET ZONE.

  2. M_WAITOK really isn't set.  Unlikely.

  If (1) is really happening, we'll need to think about it a little more
  before deciding how to fix it.  As you can see, due to the recursive
  nature of UMA/VM, things can get really tough when resources are
  scarce.

Regards,
-- 
Bosko Milekic
bmilekic_at_technokratis.com
bmilekic_at_FreeBSD.org
Received on Wed Dec 22 2004 - 21:15:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:25 UTC