Re: [poll / rfc] kdb_stop_cpus

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Fri, 03 Jun 2011 19:12:37 +0300
on 03/06/2011 18:28 Nathan Whitehorn said the following:
> On 06/03/11 10:13, Andriy Gapon wrote:
>>
>> I wonder if anybody uses kdb_stop_cpus with non-default value.
>> If, yes, I am very interested to learn about your usecase for it.
>>
>> I think that the default kdb behavior is the correct one, so it doesn't make sense
>> to have a knob to turn on incorrect behavior.
>> But I may be missing something obvious.
>>
>> The comment in the code doesn't really satisfy me:
>> /*
>>   * Flag indicating whether or not to IPI the other CPUs to stop them on
>>   * entering the debugger.  Sometimes, this will result in a deadlock as
>>   * stop_cpus() waits for the other cpus to stop, so we allow it to be
>>   * disabled.  In order to maximize the chances of success, use a hard
>>   * stop for that.
>>   */
>>
>> The hard stop should be sufficiently mighty.
>> Yes, I am aware of supposedly extremely rare situations where a deadlock could
>> happen even when using hard stop.  But I'd rather fix that than have this switch.
>>
>> Oh, the commit message (from 2004) explains it:
>>> Add a new sysctl, debug.kdb.stop_cpus, which controls whether or not we
>>> attempt to IPI other cpus when entering the debugger in order to stop
>>> them while in the debugger.  The default remains to issue the stop;
>>> however, that can result in a hang if another cpu has interrupts disabled
>>> and is spinning, since the IPI won't be received and the KDB will wait
>>> indefinitely.  We probably need to add a timeout, but this is a useful
>>> stopgap in the mean time.
>>
>> But that was before we started using hard stop in this context (in 2009).
> 
> Some non-x86 platforms (e.g. PPC) don't support real NMIs, and so this still applies.

Well, even if it does, there are two things that can be done about that (and, IMO,
both are better than the manually controlled knob):

- quick and dirty: just let stop_cpus[_hard] timeout; this way good CPUs are
stopped and the bad ones are no worse than with kdb_stop_cpus=0.
- have a special reserved high priority interrupt, change 'disabling of
interrupts' to 'disabling of all interrupts except the special one' by employing
various kinds of interrupt priority registers (like it was done for splX stuff);
use the special interrupt like an IPI+NMI.

What do you think?

P.S. I think that the first "quick and dirty" thing should be done anyway,
regardless of any other changes and plans.

-- 
Andriy Gapon
Received on Fri Jun 03 2011 - 14:12:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:14 UTC