On Thu, Dec 16, 2004 at 03:21:44PM -0500, John Baldwin wrote: > On Monday 06 December 2004 08:59 am, Peter Holm wrote: > > On Fri, Nov 19, 2004 at 05:10:19PM -0500, John Baldwin wrote: > > > On Friday 19 November 2004 02:59 am, Peter Holm wrote: > > > > On Mon, Nov 15, 2004 at 03:46:15PM -0500, John Baldwin wrote: > > > > > On Friday 12 November 2004 07:33 am, Peter Holm wrote: > > > > > > GENERIC HEAD from Nov 11 08:05 UTC > > > > > > > > > > > > The following stack traces etc. was done before my first > > > > > > cup of coffee, so it's not so informative as it could have been :-( > > > > > > > > > > > > The test box appeared to have been frozen for more than 6 hours, > > > > > > but was pingable. > > > > > > > > > > > > http://www.holm.cc/stress/log/cons86.html > > > > > > > > > > A weak guess is that you have the system in some sort of livelock due > > > > > to fork()? Have you tried running with 'debug.mpsafevm=1' set from > > > > > the loader? > > > > > > > > > > -- > > > > > John Baldwin <jhb_at_FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ > > > > > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > > > > > > > > OK, I've got some more info: > > > > > > > > http://www.holm.cc/stress/log/cons88.html > > > > > > > > Looks like a spin in uma_zone_slab() when slab_zalloc() fails? > > > > > > Yes, I think if you specify M_WAITOK, then that might happen. > > > slab_zalloc() can fail if any of the init functions fail for example, in > > > which case it would loop forever. You can try this hack (though it may > > > very well be wrong) to return failure if that is what is triggering: > > > > > > Index: uma_core.c > > > =================================================================== > > > RCS file: /usr/cvs/src/sys/vm/uma_core.c,v > > > retrieving revision 1.110 > > > diff -u -r1.110 uma_core.c > > > --- uma_core.c 6 Nov 2004 11:43:30 -0000 1.110 > > > +++ uma_core.c 19 Nov 2004 22:08:26 -0000 > > > _at__at_ -1998,6 +1998,10 _at__at_ > > > */ > > > if (flags & M_NOWAIT) > > > flags |= M_NOVM; > > > + > > > + /* XXXHACK */ > > > + if (flags & M_WAITOK) > > > + break; > > > } > > > return (slab); > > > } > > > > > > -- > > > John Baldwin <jhb_at_FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ > > > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > > > > I instrumented the code with this: > > $ cvs diff -u > > cvs diff: Diffing . > > Index: uma_core.c > > =================================================================== > > RCS file: /home/ncvs/src/sys/vm/uma_core.c,v > > retrieving revision 1.110 > > diff -u -r1.110 uma_core.c > > --- uma_core.c 6 Nov 2004 11:43:30 -0000 1.110 > > +++ uma_core.c 6 Dec 2004 13:49:36 -0000 > > _at__at_ -1926,6 +1926,7 _at__at_ > > { > > uma_slab_t slab; > > uma_keg_t keg; > > + int i; > > > > keg = zone->uz_keg; > > > > _at__at_ -1943,7 +1944,8 _at__at_ > > > > slab = NULL; > > > > - for (;;) { > > + for (i = 0;;i++) { > > + KASSERT(i < 10000, ("uma_zone_slab is looping")); > > /* > > * Find a slab with some space. Prefer slabs that are > > partially * used over those that are totally full. This helps to reduce > > > > and now during test of Jeff Roberson's "SMP FFS" patch the assert > > triggered: http://www.holm.cc/stress/log/cons92.html > > Hmm. Does the hack patch above make the hang go away or does it just break > things worse? > > -- > John Baldwin <jhb_at_FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org I've uploaded two different freeze incidents to http://www.holm.cc/stress/log/freeze01.html and http://www.holm.cc/stress/log/freeze02.html just in case there should be any new clues in there. The first is switching threads, wheres the second isn't: freeze01:curthread = 0xc301f8a0: pid 65444 "net" freeze01:curthread = 0xc302f000: pid 65452 "net" freeze02:curthread = 0xc25eb2e0: pid 73508 "fork" freeze02:curthread = 0xc25eb2e0: pid 73508 "fork" freeze02:curthread = 0xc25eb2e0: pid 73508 "fork" freeze02:curthread = 0xc25eb2e0: pid 73508 "fork" freeze02:curthread = 0xc25eb2e0: pid 73508 "fork" I'm testing your patch right now, but I guess it will be days before we know for sure. -- Peter HolmReceived on Fri Dec 17 2004 - 09:07:18 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:24 UTC