On 18 Jun, I wrote: > When I was attempting to debug a system deadlock problem where the > culprit process was sleeping on a "pool mutex", I noticed that "show > witness" in ddb doesn't report anything about this particular mutex > flavor. I discovered that witness doesn't monitor these mutexes because > mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag. > > These are a mutexes bit special, because they are supposed to be leaf > mutexes and no other mutexes should be grabbed after them. The deadlock > in question caused me to discover a violation of this restriction, so I > wondered if there were more problems of this type in the code. I > suspected there would be, since there haven't been any automatic checks > of to verify that these mutexes are being used correctly. > > Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup() > and quickly found the first violation during the boot sequence: [ snip - I committed a patch ] > Any bets on how many other potential deadlock problems there are in the > tree? The only problems I've found so far are in fdrop_locked() and kern_open(), so things might not be as bleak as I initially feared. I also got this LOR message from witness about the sx lock code: lock order reversal 1st 0xc05e1020 pool mutex (pool mutex) _at_ /usr/src/sys/kern/kern_sx.c:111 2nd 0xc05dfa00 module subsystem sx lock (module subsystem sx lock) _at_ /usr/src/s ys/kern/kern_module.c:126 I *think* this is actually a safe use of pool mutex. What would be the best way to quite the complaint? The two possibilities that I can think of are to handle this as a special case in the witness code or to slightly rearrange the code in sx_lock.c to swap the order of the WITNESS_LOCK() and mtx_unlock() calls in _sx*_lock().Received on Wed Jun 18 2003 - 19:39:33 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:12 UTC