Re: Stop scheduler on panic

From: Attilio Rao <attilio_at_freebsd.org>
Date: Thu, 17 Nov 2011 22:05:29 +0100
2011/11/17  <mdf_at_freebsd.org>:
> On Thu, Nov 17, 2011 at 12:54 PM, Attilio Rao <attilio_at_freebsd.org> wrote:
>> 2011/11/17 Andriy Gapon <avg_at_freebsd.org>:
>>> BTW, it is my opinion that we really should not let the debugger code call
>>> mi_switch for any reason.
>>
>> Yes, I agree with this, this is why the sched_bind() in boot() is
>> broken (immagine calling things like doadump from KDB. KDB right now
>> can be thought as a first cut of this patch because it does disable
>> the CPUs when entering the context, thus, the bug here is that if you
>> stop all CPUs including CPU0 and later on you want bind on it you are
>> death).
>
> Another patch related to this area we have at $WORK:
>
>  #if defined(SMP)
> -       /*
> -        * Bind us to CPU 0 so that all shutdown code runs there.  Some
> -        * systems don't shutdown properly (i.e., ACPI power off) if we
> -        * run on another processor.
> -        */
> -       thread_lock(curthread);
> -       sched_bind(curthread, 0);
> -       thread_unlock(curthread);
> -       KASSERT(PCPU_GET(cpuid) == 0, ("%s: not running on cpu 0", __func__));
> +       /*
> +        * sched_bind can't be done reliably inside of panic.  cpu_reset() will
> +        * rebind us in any case, more reliably.
> +        */
> +       if (panicstr == NULL) {
> +               /*
> +                * Bind us to CPU 0 so that all shutdown code runs there.  Some
> +                * systems don't shutdown properly (i.e., ACPI power off) if we
> +                * run on another processor.
> +                */
> +               thread_lock(curthread);
> +               sched_bind(curthread, 0);
> +               thread_unlock(curthread);
> +               KASSERT(PCPU_GET(cpuid) == 0, ("boot: not running on cpu 0"));
> +       }
>  #endif
>        /* We're in the process of rebooting. */
>        rebooting = 1;

This doesn't cover the KDB case which is the most broken here.
(I'm a bit unsure about the name of functions and I cannot check now,
but in short):
- you enter KDB via debug.kdb.enter=1 (for example)
- kdb_enter() stop CPUs and if it is on CPU1 it stops CPU0
- you call functions entering boot() from KDB prompt (IIRC "call
doadump" should do it)
- boot() wants to bind on CPU0 which is turned off

This case only take care of panic, which is not enough.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
Received on Thu Nov 17 2011 - 20:05:31 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC