Re: Stop scheduler on panic

From: Andriy Gapon <avg_at_FreeBSD.org> Date: Thu, 01 Dec 2011 18:59:10 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC

[cc list trimmed]

on 21/11/2011 18:32 John Baldwin said the following:
> On Friday, November 18, 2011 4:59:32 pm Andriy Gapon wrote:
>> on 17/11/2011 23:38 John Baldwin said the following:
>>> On Thursday, November 17, 2011 4:35:07 pm John Baldwin wrote:
>>>> Hmmm, you could also make critical_exit() not perform deferred preemptions
>>>> if SCHEDULER_STOPPED?  That would fix the recursion and still let the
>>>> preemption "work" when resuming from the debugger?

Just to clarify, probably you are actually suggesting to not perform deferred
preemptions if kdb_active == TRUE.  Because that's where we get the recursion (via
kdb_switch).

I think that if we get into the mi_switch in a state where !kdb_active &&
SCHEDULER_STOPPED(), then we probably should just - I don't know - panic again?

[the following is preserved for context]

>> Yes, that's a good solution, I think.  I just didn't want to touch such a "low
>> level" code, but I guess there is no rational reason for that.
>>
>>> Or you could let the debugger run in a permament critical section (though
>>> perhaps that breaks too many other things like debugger routines that try
>>> to use locks like the 'kill' command (which is useful!)).  Effectively what you
>>> are trying to do is having the debugger run in a critical section until the
>>> debugger is exited.  It would be cleanest to let it run that way explicitly
>>> if possible since then you don't have to catch as many edge cases.
>>
>> I like this idea, but likely it would take some effort to get done.
> 
> Yes, it would take some effort, so checking SCHEDULER_STOPPED in
> critical_exit() is probably good for the short term.  Would be nice to fix
> it properly some day.
> 
>> Related to this is something that I attempted to discuss before.  I think that
>> because the debugger acts on a frozen system image (the debugger is a sole actor
>> and observer), the debugger should try to minimize its interaction with the
>> debugged system.  In this vein I think that the debugger should also bypass any
>> locks just like with SCHEDULER_STOPPED.  The debugger should also be careful to
>> note a state of any subsystems that it uses (e.g. the keyboard) and return them
>> to the initial state if it had to be changed.  But that's a bit different story.
>>  And I really get beyond my knowledge zone when I try to think about things like
>> handling 'call func_xxxx' in the debugger where func_xxxx may want to acquire
>> some locks or noticeably change some state within a system.
> 
> I think to some extent, I think if a user calls a function from the debugger
> they better know what they are doing.  However, I think it can also be useful
> to have the debugger modify the system in some cases if it can safely do so
> (e.g. the 'kill' command from DDB can be very useful, and IIRC, it is careful
> to only use try locks and just fail if it can't acquire the needed locks).
> 
>> But to continue about the locks... I have this idea to re-implement
>> SCHEDULER_STOPPED as some more general check that could be abstractly denoted as
>> LOCKING_POLICY_CHECK(context).  Where the context could be defined by flags like
>> normal, in-panic, in-debugger, etc.  And the locking policies could be: normal,
>> bypass, warn, panic, etc.
>>
>> However, I am not sure if this could be useful (and doable properly) in
>> practice.  I am just concerned with the interaction between the debugger and the
>> locks.  It still seems to me inconsistent that we are going with
>> SCHEDULER_STOPPED for panic, but we are continuing to use "if (!kdb_active)"
>> around some locks that could be problematic under kdb (e.g. in USB).  In my
>> opinion the amount of code shared between normal context and kdb context is
>> about the same as amount of code shared between normal context and panic
>> context.  But I haven't really quantified those.
> 
> I think you need to keep the 'kill' case in mind.  In that case you don't want
> to ignore locks, but the code is carefully written to use try locks instead (or
> should be).
> 

-- 
Andriy Gapon