When I was attempting to debug a system deadlock problem where the culprit process was sleeping on a "pool mutex", I noticed that "show witness" in ddb doesn't report anything about this particular mutex flavor. I discovered that witness doesn't monitor these mutexes because mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag. These are a mutexes bit special, because they are supposed to be leaf mutexes and no other mutexes should be grabbed after them. The deadlock in question caused me to discover a violation of this restriction, so I wondered if there were more problems of this type in the code. I suspected there would be, since there haven't been any automatic checks of to verify that these mutexes are being used correctly. Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup() and quickly found the first violation during the boot sequence: Mounting root from ufs:/dev/da0s1a acquiring duplicate lock of same type: "pool mutex" 1st pool mutex _at_ /usr/src/sys/kern/vfs_syscalls.c:736 2nd pool mutex _at_ /usr/src/sys/kern/kern_lock.c:598 Stack backtrace: backtrace(c051f4df,c051c041,c051adcc,256,c05e4808) at backtrace+0x17 witness_lock(c05e0cf0,8,c051adcc,256,c05e2ae0) at witness_lock+0x697 _mtx_lock_flags(c05e0cf0,0,c051adcc,256,c641a248) at _mtx_lock_flags+0xb1 lockstatus(c641a304,0,e4b1ab24,c036f2e8,e4b1ab44) at lockstatus+0x3c vop_stdislocked(e4b1ab44,e4b1ab30,c0464f78,e4b1ab44,e4b1ab58) at vop_stdislocked +0x21 vop_defaultop(e4b1ab44,e4b1ab58,c0375a67,e4b1ab44,c05e4808) at vop_defaultop+0x1 8 ufs_vnoperate(e4b1ab44,c05e4808,c05740ac,c05b67e0,c641a248) at ufs_vnoperate+0x1 8 assert_vop_locked(c641a248,c05197fc,c05b74e0,c641a248,0) at assert_vop_locked+0x 47 VOP_GETVOBJECT(c641a248,0,c0524224,2e0,c05740ac) at VOP_GETVOBJECT+0x3f kern_open(c61c64c0,bfbff6b0,0,1,0) at kern_open+0x44f open(c61c64c0,e4b1ad10,c0537dc3,3fd,3) at open+0x30 syscall(2f,2f,2f,1,0) at syscall+0x26e Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (5), eip = 0x80529ef, esp = 0xbfbff16c, ebp = 0xbfbff208 --- The code in question: FILEDESC_LOCK(fdp); FILE_LOCK(fp); if (fp->f_count == 1) { KASSERT(fdp->fd_ofiles[indx] != fp, ("Open file descriptor lost all refs")); FILEDESC_UNLOCK(fdp); FILE_UNLOCK(fp); VOP_UNLOCK(vp, 0, td); vn_close(vp, flags & FMASK, fp->f_cred, td); fdrop(fp, td); td->td_retval[0] = indx; return 0; } /* assert that vn_open created a backing object if one is needed */ KASSERT(!vn_canvmio(vp) || VOP_GETVOBJECT(vp, NULL) == 0, ("open: vmio vnode has no backing object after vn_open")); fp->f_data = vp; fp->f_flag = flags & FMASK; fp->f_ops = &vnops; fp->f_type = (vp->v_type == VFIFO ? DTYPE_FIFO : DTYPE_VNODE); FILEDESC_UNLOCK(fdp); FILE_UNLOCK(fp); This one appears to be easily fixable by moving the second KASSERT down a few lines to below the FILE_UNLOCK() call. Any bets on how many other potential deadlock problems there are in the tree?Received on Tue Jun 17 2003 - 23:32:25 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:12 UTC