Re: Freeze

From: Peter Holm <peter_at_holm.cc>
Date: Mon, 6 Dec 2004 14:59:34 +0100
On Fri, Nov 19, 2004 at 05:10:19PM -0500, John Baldwin wrote:
> On Friday 19 November 2004 02:59 am, Peter Holm wrote:
> > On Mon, Nov 15, 2004 at 03:46:15PM -0500, John Baldwin wrote:
> > > On Friday 12 November 2004 07:33 am, Peter Holm wrote:
> > > > GENERIC HEAD from Nov 11 08:05 UTC
> > > >
> > > > The following stack traces etc. was done before my first
> > > > cup of coffee, so it's not so informative as it could have been :-(
> > > >
> > > > The test box appeared to have been frozen for more than 6 hours,
> > > > but was pingable.
> > > >
> > > > http://www.holm.cc/stress/log/cons86.html
> > >
> > > A weak guess is that you have the system in some sort of livelock due to
> > > fork()?  Have you tried running with 'debug.mpsafevm=1' set from the
> > > loader?
> > >
> > > --
> > > John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> > > "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
> >
> > OK, I've got some more info:
> >
> > http://www.holm.cc/stress/log/cons88.html
> >
> > Looks like a spin in uma_zone_slab() when slab_zalloc() fails?
> 
> Yes, I think if you specify M_WAITOK, then that might happen.  slab_zalloc() 
> can fail if any of the init functions fail for example, in which case it 
> would loop forever.  You can try this hack (though it may very well be wrong) 
> to return failure if that is what is triggering:
> 
> Index: uma_core.c
> ===================================================================
> RCS file: /usr/cvs/src/sys/vm/uma_core.c,v
> retrieving revision 1.110
> diff -u -r1.110 uma_core.c
> --- uma_core.c	6 Nov 2004 11:43:30 -0000	1.110
> +++ uma_core.c	19 Nov 2004 22:08:26 -0000
> _at__at_ -1998,6 +1998,10 _at__at_
>  		 */
>  		if (flags & M_NOWAIT)
>  			flags |= M_NOVM;
> +
> +		/* XXXHACK */
> +		if (flags & M_WAITOK)
> +			break;
>  	}
>  	return (slab);
>  }
> 
> -- 
> John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org

I instrumented the code with this:
$ cvs diff -u
cvs diff: Diffing .
Index: uma_core.c
===================================================================
RCS file: /home/ncvs/src/sys/vm/uma_core.c,v
retrieving revision 1.110
diff -u -r1.110 uma_core.c
--- uma_core.c  6 Nov 2004 11:43:30 -0000       1.110
+++ uma_core.c  6 Dec 2004 13:49:36 -0000
_at__at_ -1926,6 +1926,7 _at__at_
 {
        uma_slab_t slab;
        uma_keg_t keg;
+       int i;
 
        keg = zone->uz_keg;
 
_at__at_ -1943,7 +1944,8 _at__at_
 
        slab = NULL;
 
-       for (;;) {
+       for (i = 0;;i++) {
+               KASSERT(i < 10000, ("uma_zone_slab is looping"));
                /*
                 * Find a slab with some space.  Prefer slabs that are partially
                 * used over those that are totally full.  This helps to reduce

and now during test of Jeff Roberson's "SMP FFS" patch the assert triggered:
http://www.holm.cc/stress/log/cons92.html
-- 
Peter Holm
Received on Mon Dec 06 2004 - 12:59:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:23 UTC