Re: panic: uma_zone_slab is looping

From: Bosko Milekic <bmilekic_at_technokratis.com>
Date: Sun, 26 Dec 2004 13:17:38 -0500
On Sun, Dec 26, 2004 at 05:11:53PM +0100, Peter Holm wrote:
> 
> Yes, I think that I have verified your exelent analysis of the
> problem: http://www.holm.cc/stress/log/freeze04.html
> 
> So, do have any fix suggenstons? :-)

  Not yet, because the problem is non-obvious from the trace.

  I need to know exactly when the UMA RCntSlabs zone recurses _first_,
  and I need to confirm that it is an actual recursion.  I've looked at
  the VM code and I don't see how/why recursion on the RCntSlabs zone
  would happen.

  Please modify the printf code to look exactly like this:

   if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) {
	if ((zone == slabzone) || (zone == slabrefzone))
		panic("Zone %s forced to fail due to recurse non-null: %d\n",
		    zone->uz_name, keg->uk_recurse);
   	return (NULL);
   }

  (You don't need to check any global counter -- the counter is imperfect
  anyway -- because even a single recursion on slabzone or slabrefzone
  should be illegal).

  I'd like to see the trace from the above panic, if possible.

  Also, from your current crash dump, see if you can print the value of
  keg->uk_recurse (from frame 11, pid 74804).

  It appears that the other KASSERT being triggered from
  propagate_priority() is due to some weird side-effect of process
  74804 looping with the UMA RCntSlabs zone lock held (without it
  ever being dropped).  We'll have to see.

  The point is: the trace is useless unless it shows where/when the
  recursion on slabrefzone _begins_ to happen (not that it has already
  happened, that part is obvious now). 

Happy holidays,
-- 
Bosko Milekic
bmilekic_at_technokratis.com
bmilekic_at_FreeBSD.org
Received on Sun Dec 26 2004 - 17:17:41 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:25 UTC