Re: Cdiff patch for kernel gdb and mi_switch panic in freebsd 5.4 STABLE

From: Lonnie VanZandt <lonnie.vanzandt_at_ngc.com>
Date: Thu, 3 Nov 2005 12:29:52 -0700
I think I follow the proposal. Sure, I'll apply your patch and run with it on 
my SMP box. It may take a while to reach a conclusion on its merits due to 
the racy nature of the crash.

On Thursday 03 November 2005 11:27 am, John Baldwin wrote:
> On Sunday 09 October 2005 05:49 pm, Lonnie VanZandt wrote:
> > Attached is the patch for the revised subr_kdb.c from FreeBSD 5.4 STABLE.
> > (the rcsid is __FBSDID("$FreeBSD: src/sys/kern/subr_kdb.c,v 1.5.2.2.2.1
> > 2005/05/01 05:38:14 dwhite Exp $"); )
>
> I've looked at this, but I think t could maybe be done slightly
> differently. Here's a suggested patch that would close the race you are
> seeing I think while allowing semantics such that if two CPUs try to enter
> KDB at the same time, they would serialize and the second CPU would enter
> kdb after the first had exited.  Could you at least test it to see if it
> addresses your race condition?
>
> --- //depot/projects/smpng/sys/kern/subr_kdb.c	2005/10/27 19:51:50
> +++ //depot/user/jhb/ktrace/kern/subr_kdb.c	2005/11/03 18:24:38
> _at__at_ -39,6 +39,7 _at__at_
>  #include <sys/smp.h>
>  #include <sys/sysctl.h>
>
> +#include <machine/cpu.h>
>  #include <machine/kdb.h>
>  #include <machine/pcb.h>
>
> _at__at_ -462,12 +463,21 _at__at_
>  		return (0);
>
>  	/* We reenter the debugger through kdb_reenter(). */
> -	if (kdb_active)
> +	if (kdb_active == PCPU_GET(cpuid) + 1)
>  		return (0);
>
>  	critical_enter();
>
> -	kdb_active++;
> +	/*
> +	 * If more than one CPU tries to enter KDB at the same time
> +	 * then force them to serialize and go one at a time.
> +	 */
> +	while (!atomic_cmpset_int(&kdb_active, 0, PCPU_GET(cpuid) + 1)) {
> +		critical_exit();
> +		while (kdb_active)
> +			cpu_spinwait();
> +		critical_enter();
> +	}
>
>  #ifdef SMP
>  	if ((did_stop_cpus = kdb_stop_cpus) != 0)
> _at__at_ -484,13 +494,17 _at__at_
>
>  	handled = kdb_dbbe->dbbe_trap(type, code);
>
> +	/*
> +	 * We have to exit KDB before resuming the other CPUs so that they
> +	 * may run in a debugger-less context.
> +	 */
> +	kdb_active = 0;
> +
>  #ifdef SMP
>  	if (did_stop_cpus)
>  		restart_cpus(stopped_cpus);
>  #endif
>
> -	kdb_active--;
> -
>  	critical_exit();
>
>  	return (handled);
Received on Thu Nov 03 2005 - 18:31:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:47 UTC