fun with WITNESS and "pool mutex"

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Wed, 18 Jun 2003 01:32:17 -0700 (PDT)
When I was attempting to debug a system deadlock problem where the
culprit process was sleeping on a "pool mutex", I noticed that "show
witness" in ddb doesn't report anything about this particular mutex
flavor.  I discovered that witness doesn't monitor these mutexes because
mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag.

These are a mutexes bit special, because they are supposed to be leaf
mutexes and no other mutexes should be grabbed after them.  The deadlock
in question caused me to discover a violation of this restriction, so I
wondered if there were more problems of this type in the code.  I
suspected there would be, since there haven't been any automatic checks
of to verify that these mutexes are being used correctly.

Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup()
and quickly found the first violation during the boot sequence:

Mounting root from ufs:/dev/da0s1a
acquiring duplicate lock of same type: "pool mutex"
 1st pool mutex _at_ /usr/src/sys/kern/vfs_syscalls.c:736
 2nd pool mutex _at_ /usr/src/sys/kern/kern_lock.c:598
Stack backtrace:
backtrace(c051f4df,c051c041,c051adcc,256,c05e4808) at backtrace+0x17
witness_lock(c05e0cf0,8,c051adcc,256,c05e2ae0) at witness_lock+0x697
_mtx_lock_flags(c05e0cf0,0,c051adcc,256,c641a248) at _mtx_lock_flags+0xb1
lockstatus(c641a304,0,e4b1ab24,c036f2e8,e4b1ab44) at lockstatus+0x3c
vop_stdislocked(e4b1ab44,e4b1ab30,c0464f78,e4b1ab44,e4b1ab58) at vop_stdislocked
+0x21
vop_defaultop(e4b1ab44,e4b1ab58,c0375a67,e4b1ab44,c05e4808) at vop_defaultop+0x1
8
ufs_vnoperate(e4b1ab44,c05e4808,c05740ac,c05b67e0,c641a248) at ufs_vnoperate+0x1
8
assert_vop_locked(c641a248,c05197fc,c05b74e0,c641a248,0) at assert_vop_locked+0x
47
VOP_GETVOBJECT(c641a248,0,c0524224,2e0,c05740ac) at VOP_GETVOBJECT+0x3f
kern_open(c61c64c0,bfbff6b0,0,1,0) at kern_open+0x44f
open(c61c64c0,e4b1ad10,c0537dc3,3fd,3) at open+0x30
syscall(2f,2f,2f,1,0) at syscall+0x26e
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (5), eip = 0x80529ef, esp = 0xbfbff16c, ebp = 0xbfbff208 ---


The code in question:

        FILEDESC_LOCK(fdp);
        FILE_LOCK(fp);
        if (fp->f_count == 1) {
                KASSERT(fdp->fd_ofiles[indx] != fp,
                    ("Open file descriptor lost all refs"));
                FILEDESC_UNLOCK(fdp);
                FILE_UNLOCK(fp);
                VOP_UNLOCK(vp, 0, td);
                vn_close(vp, flags & FMASK, fp->f_cred, td);   
                fdrop(fp, td);
                td->td_retval[0] = indx;
                return 0;
        }
                      
        /* assert that vn_open created a backing object if one is needed */
        KASSERT(!vn_canvmio(vp) || VOP_GETVOBJECT(vp, NULL) == 0,
                ("open: vmio vnode has no backing object after vn_open"));
                
        fp->f_data = vp;
        fp->f_flag = flags & FMASK;
        fp->f_ops = &vnops;
        fp->f_type = (vp->v_type == VFIFO ? DTYPE_FIFO : DTYPE_VNODE);
        FILEDESC_UNLOCK(fdp);
        FILE_UNLOCK(fp);

This one appears to be easily fixable by moving the second KASSERT down
a few lines to below the FILE_UNLOCK() call.

Any bets on how many other potential deadlock problems there are in the
tree?
Received on Tue Jun 17 2003 - 23:32:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:12 UTC