Re: CURRENT freezes on Laitude D520

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Mon, 11 Dec 2006 14:08:04 +0000 (GMT)
On Mon, 11 Dec 2006, Tai-hwa Liang wrote:

>> WITNESS is available in RELENG_6, and should be used in combination with 
>> INVARIANTS, DDB, KDB, and BREAK_TO_DEBUGGER to debug deadlocks.
>
>  Would a kernel with WITNESS/[KD]DB/BREAK_TO_DEBUGGER enabled but w/o 
> INVARIANTS compiled adequate to dump useful information through remote 
> serial console?

It depends a lot on the deadlock.  The warnings you've attached below provide 
a lot of information, however.

>> It sounds like you need to follow the instructions for kernel debugging. 
>> Depending on your tolerance of performance loss, downtime, etc, a good 
>> starting point is to configure the kernel with INVARIANTS and WITNESS. 
>> WITNESS is particularly important, if you can tolerate the performance hit, 
>> as it warns of potential deadlocks, not just actual deadlocks.  Also, 
>> compile
>
> With WITNESS enabled(debug.mpsafenet=0), I got another three pf related 
> warnings in the last 8 hours:

Are you using uid/gid credential rules with pf?

>> the kernel with KDB, DDB, and BREAK_TO_DEBUGGER, and user a serial or 
>> firewire console.  If the hang occurs, see if you can get into the 
>> debugger, in which case the logged output from DDB for the following 
>> commands would be very useful:
>> 
>> show pcpu
>> show allpcpu
>> trace
>> alltrace
>> ps
>> show locks
>> show alllocks
>> show lockedvnods
>> show uma
>> show malloc
>> 
>> Please open a PR that describes your configuration, includes your kernel 
>> config (since it seems quite customized), any loader.conf settings, a 
>> detailed description of the problem, and the output.  I'd be quite 
>> interested
>
>  Okay, I'll file a PR once I can collect more information with the serial 
> console(probably weekend).  For now our system administrator is pretty 
> nervous about my suggestion on turning debug.mpsafenet back to 1. ;)

Thanks.

>> in know, once the machine is in a hung state, whether the numlock light 
>> goes on and off when you hit the numlock key on the keyboard.
>
> The numlock light doesn't respond to the key when the machine is hanging; 
> hence Ctrl-Alt-Esc wouldn't break to debugger.

Serial break is significantly more reliable for getting into the debugger on 
the system as it stands, as syscons requires the Giant lock while the serial 
interrupt handler doesn't.  As a result, serial break can often get you into 
the debugger even when Giant has been leaked.  The numlock light not going on 
and off is a reasonable test of whether Giant has been leaked and/or 
interrupts have been left disabled on all CPUs, as it means that the syscons 
interrupt handler is unable to run, hence my inquiring.

Robert N M Watson
Computer Laboratory
University of Cambridge
Received on Mon Dec 11 2006 - 13:32:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:03 UTC