Re: RELENG_7 and HEAD: bge causes system hang

From: Cristian KLEIN <cristi_at_net.utcluj.ro>
Date: Mon, 26 Nov 2007 20:55:11 +0200
Robert Watson wrote:
> 
> On Mon, 26 Nov 2007, Cristian KLEIN wrote:
> 
>> Great to hear this problem was solved. I still have one big fat
>> question. Why did the system hang and not allow the kernel debugger
>> show up? I strongly believe that this bug would have been easily
>> spotted suppose KDB would have responded. Is it perhaps possible to
>> "harden" KDB, so that such issues are easier to find and fix in future?
> 
> I don't know the details of this particular situation, but I can speak
> to at least one known issue in DDB: right now, getting into DDB from a
> serial console is a very quick and straight forward path, requiring only
> the delivery of the serial interrupt and execution of its fast handler. 
> The regular video console keypresses take a much more circuitous route,
> as syscons isn't MPSAFE, so include the scheduling of an ithread and
> acquisition of Giant.  As such, I've found breaking into the debugger
> much easier from a serial console for several years.  As Giant has been
> pushed off larger and larger parts of the kernel, the syscons break path
> has gotten a lot more reliable.  

That is very unfortunate. Newer laptops don't come with a serial port anymore.
As far as I know, using USB-to-serial converters won't work.

> There will always be certain cases
> where a console break (serial or video) will not work, and those include
> cases where interrupts are disabled on all CPUs (such as if spinlocks
> are held on all CPUs, perhaps due to one being leaked and then a
> cascading deadline).  In that situation, there's nothing like a nice NMI
> button or IPMI NMI to get into the debugger :-).

IIRC, spinlocks are not an issue anymore. The kernel will throw a message like
"spinlock held too long in file, line", and the issue can easily be spotted.

Is there any way to forcibly enter the DDB on a serialless laptop, so future
problems like this will be spotted faster? Perhaps, should MPSAFEing syscons get
more attention?
Received on Mon Nov 26 2007 - 17:55:22 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:23 UTC