2011/12/6 Andriy Gapon <avg_at_freebsd.org>: > on 06/12/2011 20:34 Attilio Rao said the following: > [snip] >> - I'm not entirely sure, why we want to disable interrupts at this >> moment (before to stop other CPUs)?: > > Because I believe that stop_cpus_hard() should run in a context with interrupts > and preemption disabled. Also, I believe that the whole panic handling code > should run in the same context. So it was only natural for me to enter that > context at this point. I'm not against that, I don't see anything wrong with having interrupts disabled at that point. >> _at__at_ -547,13 +555,18 _at__at_ panic(const char *fmt, ...) >> { >> #ifdef SMP >> static volatile u_int panic_cpu = NOCPU; >> + cpuset_t other_cpus; >> #endif >> struct thread *td = curthread; >> int bootopt, newpanic; >> va_list ap; >> static char buf[256]; >> >> - critical_enter(); >> + if (stop_scheduler_on_panic) >> + spinlock_enter(); >> + else >> + critical_enter(); >> + >> >> - In this chunk I don't entirely understand the kdb_active check: >> _at__at_ -566,11 +579,18 _at__at_ panic(const char *fmt, ...) >> PCPU_GET(cpuid)) == 0) >> while (panic_cpu != NOCPU) >> ; /* nothing */ >> + if (stop_scheduler_on_panic) { >> + if (panicstr == NULL && !kdb_active) { >> + other_cpus = all_cpus; >> + CPU_CLR(PCPU_GET(cpuid), &other_cpus); >> + stop_cpus_hard(other_cpus); >> + } >> + } >> #endif >> >> bootopt = RB_AUTOBOOT; >> newpanic = 0; >> - if (panicstr) >> + if (panicstr != NULL) >> bootopt |= RB_NOSYNC; >> else { >> bootopt |= RB_DUMP; >> >> Is it for avoiding to pass an empty mask to stop_cpus() in kdb_trap() >> (I saw you changed the policy there)? > > Yes. You know my position about elimination of the cpuset parameter to > stop_cpus_* and my intention to do so. This is related to that. Right now that > check is not strictly necessary, but it doesn't do any harm either. We know > that all other CPUs are already stopped when kdb_active (ditto for panicstr != > NULL). I see there could be races with disabiling interrupts and having 2 different stopping mechanisms that want to stop cpus, even using stop_cpus_hard(), on architectures that don't use a privileged channel (NMI) as mostly of our !x86. They are mostly very rare races (requiring kdb_trap() and panic() to happen in parallel on different CPUs). >> Maybe we can find a better integration among the two. > > What kind of integration? > Right now I have simple model - both stop all other CPUs. Well, there is no synchronization atm between panic stopping cpus and the kdb_trap(). When kdb_trap() stop cpus it has interrupts disabled and if panic already started they will both deadlock because IPI_STOP won't be properly delivered. However, I see all this as a problem with other arches not having/not implementing a real dedicated channel for cpu_stop_hard(), so we should not think about it now. I think we may need some sort of control as panic already does with panic_cpu before to disable interrupts, but don't worry about it now. >> I'd also move the setting of stop_scheduler variable in the "if", it >> seems a bug to me to have it set otherwise. > > Can you please explain what bug do you suspect here? > I do not see any. I just see more natural to move it within the above if (panicstr == NULL ...) condition. > [snip] >> - I'm not sure I like to change the policies on cpu stopping for KDB >> with this patchset. > > I am not sure what exactly you mean by change in policies. I do not see any > such change, entering kdb always stops all other CPUs, with or without the patch. Yes, I was confused by older code did just stopped CPUs before kdb_trap() manually, I think what kdb_trap() does now is ok (and you just retain it). I'd just change this check on panicstr: _at__at_ -606,9 +603,13 _at__at_ kdb_trap(int type, int code, struct trapframe *tf) intr = intr_disable(); #ifdef SMP - other_cpus = all_cpus; - CPU_CLR(PCPU_GET(cpuid), &other_cpus); - stop_cpus_hard(other_cpus); + if (panicstr == NULL) { + other_cpus = all_cpus; + CPU_CLR(PCPU_GET(cpuid), &other_cpus); + stop_cpus_hard(other_cpus); + did_stop_cpus = 1; + } else + did_stop_cpus = 0; to be SCHEDULER_STOPPED(). If you agree I can fix the kern_mutex, kern_sx and kern_rwlock parts and it should be done. Attilio -- Peace can only be achieved by understanding - A. EinsteinReceived on Tue Dec 06 2011 - 21:11:03 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC