Re: fun with WITNESS and "pool mutex"

From: Don Lewis <truckman_at_FreeBSD.org> Date: Wed, 18 Jun 2003 21:39:20 -0700 (PDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:12 UTC

On 18 Jun, I wrote:
> When I was attempting to debug a system deadlock problem where the
> culprit process was sleeping on a "pool mutex", I noticed that "show
> witness" in ddb doesn't report anything about this particular mutex
> flavor.  I discovered that witness doesn't monitor these mutexes because
> mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag.
> 
> These are a mutexes bit special, because they are supposed to be leaf
> mutexes and no other mutexes should be grabbed after them.  The deadlock
> in question caused me to discover a violation of this restriction, so I
> wondered if there were more problems of this type in the code.  I
> suspected there would be, since there haven't been any automatic checks
> of to verify that these mutexes are being used correctly.
> 
> Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup()
> and quickly found the first violation during the boot sequence:

[ snip - I committed a patch ]

> Any bets on how many other potential deadlock problems there are in the
> tree?

The only problems I've found so far are in fdrop_locked() and
kern_open(), so things might not be as bleak as I initially feared.

I also got this LOR message from witness about the sx lock code:

lock order reversal
 1st 0xc05e1020 pool mutex (pool mutex) _at_ /usr/src/sys/kern/kern_sx.c:111
 2nd 0xc05dfa00 module subsystem sx lock (module subsystem sx lock) _at_ /usr/src/s
ys/kern/kern_module.c:126

I *think* this is actually a safe use of pool mutex.  What would be the
best way to quite the complaint?  The two possibilities that I can think
of are to handle this as a special case in the witness code or to
slightly rearrange the code in sx_lock.c to swap the order of the
WITNESS_LOCK() and mtx_unlock() calls in _sx*_lock().