Re: Stop scheduler on panic

From: John Baldwin <jhb_at_freebsd.org> Date: Mon, 21 Nov 2011 11:32:41 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC

On Friday, November 18, 2011 4:59:32 pm Andriy Gapon wrote:
> on 17/11/2011 23:38 John Baldwin said the following:
> > On Thursday, November 17, 2011 4:35:07 pm John Baldwin wrote:
> >> Hmmm, you could also make critical_exit() not perform deferred preemptions
> >> if SCHEDULER_STOPPED?  That would fix the recursion and still let the
> >> preemption "work" when resuming from the debugger?
> 
> Yes, that's a good solution, I think.  I just didn't want to touch such a "low
> level" code, but I guess there is no rational reason for that.
> 
> > Or you could let the debugger run in a permament critical section (though
> > perhaps that breaks too many other things like debugger routines that try
> > to use locks like the 'kill' command (which is useful!)).  Effectively what you
> > are trying to do is having the debugger run in a critical section until the
> > debugger is exited.  It would be cleanest to let it run that way explicitly
> > if possible since then you don't have to catch as many edge cases.
> 
> I like this idea, but likely it would take some effort to get done.

Yes, it would take some effort, so checking SCHEDULER_STOPPED in
critical_exit() is probably good for the short term.  Would be nice to fix
it properly some day.

> Related to this is something that I attempted to discuss before.  I think that
> because the debugger acts on a frozen system image (the debugger is a sole actor
> and observer), the debugger should try to minimize its interaction with the
> debugged system.  In this vein I think that the debugger should also bypass any
> locks just like with SCHEDULER_STOPPED.  The debugger should also be careful to
> note a state of any subsystems that it uses (e.g. the keyboard) and return them
> to the initial state if it had to be changed.  But that's a bit different story.
>  And I really get beyond my knowledge zone when I try to think about things like
> handling 'call func_xxxx' in the debugger where func_xxxx may want to acquire
> some locks or noticeably change some state within a system.

I think to some extent, I think if a user calls a function from the debugger
they better know what they are doing.  However, I think it can also be useful
to have the debugger modify the system in some cases if it can safely do so
(e.g. the 'kill' command from DDB can be very useful, and IIRC, it is careful
to only use try locks and just fail if it can't acquire the needed locks).

> But to continue about the locks... I have this idea to re-implement
> SCHEDULER_STOPPED as some more general check that could be abstractly denoted as
> LOCKING_POLICY_CHECK(context).  Where the context could be defined by flags like
> normal, in-panic, in-debugger, etc.  And the locking policies could be: normal,
> bypass, warn, panic, etc.
> 
> However, I am not sure if this could be useful (and doable properly) in
> practice.  I am just concerned with the interaction between the debugger and the
> locks.  It still seems to me inconsistent that we are going with
> SCHEDULER_STOPPED for panic, but we are continuing to use "if (!kdb_active)"
> around some locks that could be problematic under kdb (e.g. in USB).  In my
> opinion the amount of code shared between normal context and kdb context is
> about the same as amount of code shared between normal context and panic
> context.  But I haven't really quantified those.

I think you need to keep the 'kill' case in mind.  In that case you don't want
to ignore locks, but the code is carefully written to use try locks instead (or
should be).

-- 
John Baldwin