Re: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE

From: Lonnie VanZandt <lonnie.vanzandt_at_ngc.com> Date: Thu, 1 Dec 2005 10:33:29 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC

Well, a patch in this area remains needed for 6.0 STABLE/REL_ENG. I just 
completed our 6.0 upgrade and got back to doing some kgdb debugging on our 
SMP box and blip! immediately encountered this kernel panic.

So, now motivated, I'm applying your alternative patch and will report back 
should it not suffice.

Lonnie.

On Thursday 03 November 2005 12:29 pm, Lonnie VanZandt wrote:
> I think I follow the proposal. Sure, I'll apply your patch and run with it
> on my SMP box. It may take a while to reach a conclusion on its merits due
> to the racy nature of the crash.
>
> On Thursday 03 November 2005 11:27 am, John Baldwin wrote:
> > On Sunday 09 October 2005 05:49 pm, Lonnie VanZandt wrote:
> > > Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4
> > > STABLE. (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v
> > > 1.5.2.2.2.1 2005/05/01 05:38:14 dwhite Exp $"); )
> >
> > I've looked at this, but I think t could maybe be done slightly
> > differently. Here's a suggested patch that would close the race you are
> > seeing I think while allowing semantics such that if two CPUs try to
> > enter KDB at the same time, they would serialize and the second CPU would
> > enter kdb after the first had exited.  Could you at least test it to see
> > if it addresses your race condition?
> >
> > --- //depot/projects/smpng/sys/kern/subr_kdb.c	2005/10/27 19:51:50
> > +++ //depot/user/jhb/ktrace/kern/subr_kdb.c	2005/11/03 18:24:38
> > _at__at_ -39,6 +39,7 _at__at_
> >  #include <sys/smp.h>
> >  #include <sys/sysctl.h>
> >
> > +#include <machine/cpu.h>
> >  #include <machine/kdb.h>
> >  #include <machine/pcb.h>
> >
> > _at__at_ -462,12 +463,21 _at__at_
> >  		return (0);
> >
> >  	/* We reenter the debugger through kdb_reenter(). */
> > -	if (kdb_active)
> > +	if (kdb_active == PCPU_GET(cpuid) + 1)
> >  		return (0);
> >
> >  	critical_enter();
> >
> > -	kdb_active++;
> > +	/*
> > +	 * If more than one CPU tries to enter KDB at the same time
> > +	 * then force them to serialize and go one at a time.
> > +	 */
> > +	while (!atomic_cmpset_int(&kdb_active, 0, PCPU_GET(cpuid) + 1)) {
> > +		critical_exit();
> > +		while (kdb_active)
> > +			cpu_spinwait();
> > +		critical_enter();
> > +	}
> >
> >  #ifdef SMP
> >  	if ((did_stop_cpus = kdb_stop_cpus) != 0)
> > _at__at_ -484,13 +494,17 _at__at_
> >
> >  	handled = kdb_dbbe->dbbe_trap(type, code);
> >
> > +	/*
> > +	 * We have to exit KDB before resuming the other CPUs so that they
> > +	 * may run in a debugger-less context.
> > +	 */
> > +	kdb_active = 0;
> > +
> >  #ifdef SMP
> >  	if (did_stop_cpus)
> >  		restart_cpus(stopped_cpus);
> >  #endif
> >
> > -	kdb_active--;
> > -
> >  	critical_exit();
> >
> >  	return (handled);