Re: CURRENT freezes on Laitude D520

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Sat, 9 Dec 2006 21:57:57 +0000 (GMT)
On Sun, 10 Dec 2006, Andrew Pantyukhin wrote:

> On 12/9/06, Robert Watson <rwatson_at_freebsd.org> wrote:
>> While this may be useful in doing some initial debugging and working around 
>> the problem, I'd really appreciate it if people didn't run with 
>> debug.mpsafenet="0" for any extended period, as it masks bugs rather than 
>> fixing them, and results in them not getting fixed.  It also leads to a 
>> significant performance hit, and we really don't want people running with 
>> debugging features like this turned on by default; I'd rather they used the 
>> cycles on INVARIANTS and WITNESS.
>
> Do we have to forget about IPSEC+IPv6?

I'm proposing removing the loader.conf/loader tunable, not the setting. 
Right now the debug_mpsafenet variable is set if one of the following two 
conditions holds:

(1) A component declaring NET_NEEDS_GIANT is compiled into the kernel.

(2) The administrator sets debug.mpsafenet to a non-zero value in loader.conf
     or manually in the loader.

The suggestion is that we remove (2), but not (1): specifically, that we allow 
sections of the kernel to declare dependence on Giant over the network stack, 
but we no longer allow administrators to force Giant over the stack using 
debug.mpsafenet (in -CURRENT).  A suggestion I had from Kris was to cause the 
machine to beep for two minutes on boot if debug.mpsafenet=1 was set, in order 
to produce a deterrent without removing it for use in debugging.

Right now, setting debug.mpsafenet=1 has three effects:

(1) Place Giant over the network stack, creating a single lock that spans the
     entire stack, preventing parallelism, as well as acting as a "master" lock
     which implicitly prevents lock order-related deadlocks in the stack.

(2) Effectively disabling preemption in the network stack, as ithreads and the
     netisr will be unable to start running until user threads exit the stack,
     regardless of priority.

(3) Effectively disable direct dispatch, as non-MPSAFE netisr handlers are
     always deferred rather than executing in the ithread context.

I suspect that many of the people setting debug.mpsafenet=1 and declaring the 
problem fixed are seeing the change due to (2) and (3), indirect rather than 
direct effects of (1).  I would much rather people experimented with:

- Disabling direct dispatch (net.isr.direct=0)

- Disabling preemption (compiling out options PREEMPTION)

- Running with WITNESS, which reports lock order reversals.

which get a bit more to the heart of most problems.  debug.mpsafenet=1 really 
exists for the purposes of supporting components which are not sufficiently 
locked to allow the stack to run MPSAFE, rather than as a means of disabling 
direct dispatch and preemption, which speak to different types of problems. 
The main reason that I haven't removed the administrator tunable to date is 
that I suspect it will be quite helpful when KAME IPSEC locking happens, but 
since that appears not to have happened yet, debug.mpsafenet as an option is 
likely causing more harm than good by being available as a stand-in sysctl 
masking other problems, causing people to not get to the point of properly 
identifying the actual cause (device driver bugs, etc).

Robert N M Watson
Computer Laboratory
University of Cambridge
Received on Sat Dec 09 2006 - 20:57:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:03 UTC