On 13 Oct, John Baldwin wrote: > On Tuesday 11 October 2005 05:43 am, Don Lewis wrote: >> On 6 Oct, Kris Kennaway wrote: [ snip ] >> >> This problem appears to be some sort of vnode lock leak. >> > >> > leaked lockmgr locks usually panic when the thread exits. >> >> There's a KASSERT to in userret() to catch this in the syscall return >> path, but that check is only enabled if the INVARIANTS kernel option is >> enabled, [ snip ] >> The code referenced by the vnode filename and line members is this block >> in ufs_lookup(): >> >> if (flags & ISDOTDOT) { >> VOP_UNLOCK(pdp, 0, td); /* race to get the inode */ >> error = VFS_VGET(pdp->v_mount, dp->i_ino, >> cnp->cn_lkflags, &tdp); >> --> vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td); >> if (error) >> return (error); >> *vpp = tdp; >> } else ... >> >> >> I don't see any obvious lock leaks in the code. > > Looks like the function returns with the desired leaf vnode locked if I > grokked it correctly in my brief perusal, so the bug could be in the caller. I think I might have found *the* bug, but if even if it is not the cause of the lock leak, I did find *a* bug. I just had my latest build hang. It didn't panic in userret(), dmake and another process both deadlocked waiting for vnode locks. Here's the list of locked vnodes: db> show lockedvnod Locked vnodes 0xc44c56cc: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc4a61b58 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xc32b2180 (pid 44849) with 1 pendi ng#0 0xc0639d5e at lockmgr+0x566 #1 0xc0690ccd at vop_stdlock+0x21 #2 0xc081744f at VOP_LOCK_APV+0x87 #3 0xc078ac24 at ffs_lock+0x10 #4 0xc081744f at VOP_LOCK_APV+0x87 #5 0xc06a47e4 at vn_lock+0xac #6 0xc0698b52 at vget+0x8e #7 0xc06919e1 at vfs_hash_get+0x8d #8 0xc0789adf at ffs_vget+0x27 #9 0xc0790539 at ufs_lookup+0xab9 #10 0xc08155e3 at VOP_CACHEDLOOKUP_APV+0x9b #11 0xc068eafd at vfs_cache_lookup+0xb5 #12 0xc081550f at VOP_LOOKUP_APV+0x87 #13 0xc0692a9a at lookup+0x3e2 #14 0xc0692456 at namei+0x37e #15 0xc069e922 at kern_access+0x6a #16 0xc069e8b5 at access+0x15 #17 0xc08058df at syscall+0x22f ino 1224704, on dev da0s2a 0xc66fd6cc: tag ufs, type VDIR usecount 2, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc4a59108 ref 0 pages 2 lock type ufs: EXCL (count 1) by thread 0xc2761480 (pid 44639) with 1 pendi ng#0 0xc0639d5e at lockmgr+0x566 #1 0xc0690ccd at vop_stdlock+0x21 #2 0xc081744f at VOP_LOCK_APV+0x87 #3 0xc078ac24 at ffs_lock+0x10 #4 0xc081744f at VOP_LOCK_APV+0x87 #5 0xc06a47e4 at vn_lock+0xac #6 0xc0698b52 at vget+0x8e #7 0xc06919e1 at vfs_hash_get+0x8d #8 0xc0789adf at ffs_vget+0x27 #9 0xc07904ce at ufs_lookup+0xa4e #10 0xc08155e3 at VOP_CACHEDLOOKUP_APV+0x9b #11 0xc068eafd at vfs_cache_lookup+0xb5 #12 0xc081550f at VOP_LOOKUP_APV+0x87 #13 0xc0692a9a at lookup+0x3e2 #14 0xc0692456 at namei+0x37e #15 0xc069ecd7 at kern_lstat+0x47 #16 0xc069ec73 at lstat+0x1b #17 0xc08058df at syscall+0x22f ino 1224721, on dev da0s2a This is the stack trace for process 44849: (kgdb) where #0 0xc06551b7 in sched_switch (td=0xc32b2180, newtd=0xc2373180, flags=1) at /usr/src/sys/kern/sched_4bsd.c:973 #1 0xc064aa14 in mi_switch (flags=1, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:356 #2 0xc0662100 in sleepq_switch (wchan=0x0) at /usr/src/sys/kern/subr_sleepqueue.c:427 #3 0xc06622e4 in sleepq_wait (wchan=0xc66fd724) at /usr/src/sys/kern/subr_sleepqueue.c:539 #4 0xc064a685 in msleep (ident=0xc66fd724, mtx=0xc09477e8, priority=80, wmesg=0xc087909e "ufs", timo=0) at /usr/src/sys/kern/kern_synch.c:227 #5 0xc0639759 in acquire (lkpp=0xeb6818d8, extflags=64, wanted=393216) at /usr/src/sys/kern/kern_lock.c:115 #6 0xc0639cba in lockmgr (lkp=0xc66fd724, flags=8194, interlkp=0x40, td=0xc32b2180) at /usr/src/sys/kern/kern_lock.c:340 #7 0xc0690ccd in vop_stdlock (ap=0x0) at /usr/src/sys/kern/vfs_default.c:263 #8 0xc081744f in VOP_LOCK_APV (vop=0xc08e0420, a=0xeb681940) at vnode_if.c:1642 #9 0xc078ac24 in ffs_lock (ap=0xeb681940) at /usr/src/sys/ufs/ffs/ffs_vnops.c:341 #10 0xc081744f in VOP_LOCK_APV (vop=0xc0912ce0, a=0xeb681940) at vnode_if.c:1642 #11 0xc06a47e4 in vn_lock (vp=0xc66fd6cc, flags=8194, td=0xc32b2180) at vnode_if.h:844 ---Type <return> to continue, or q <return> to quit--- #12 0xc0698b52 in vget (vp=0xc66fd6cc, flags=8194, td=0xc32b2180) at /usr/src/sys/kern/vfs_subr.c:1953 #13 0xc06919e1 in vfs_hash_get (mp=0xc25af800, hash=1224721, flags=2, td=0xc32b2180, vpp=0xeb681a94, fn=0, arg=0x0) at /usr/src/sys/kern/vfs_hash.c:81 #14 0xc0789adf in ffs_vget (mp=0xc25af800, ino=1224721, flags=2, vpp=0xeb681a94) at pcpu.h:162 #15 0xc0790539 in ufs_lookup (ap=0xeb681b3c) at /usr/src/sys/ufs/ufs/ufs_lookup.c:571 #16 0xc08155e3 in VOP_CACHEDLOOKUP_APV (vop=0xc0913220, a=0xeb681b3c) at vnode_if.c:150 #17 0xc068eafd in vfs_cache_lookup (ap=0x0) at vnode_if.h:82 #18 0xc081550f in VOP_LOOKUP_APV (vop=0xc0913220, a=0xeb681bd4) at vnode_if.c:99 #19 0xc0692a9a in lookup (ndp=0xeb681c68) at vnode_if.h:56 #20 0xc0692456 in namei (ndp=0xeb681c68) at /usr/src/sys/kern/vfs_lookup.c:203 #21 0xc069e922 in kern_access (td=0xc32b2180, path=0x0, pathseg=UIO_USERSPACE, flags=0) at /usr/src/sys/kern/vfs_syscalls.c:1879 #22 0xc069e8b5 in access (td=0xc32b2180, uap=0x0) at /usr/src/sys/kern/vfs_syscalls.c:1856 #23 0xc08058df in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 65, tf_esi = 1208418304, tf_ ebp = -1077950920, tf_isp = -345498268, tf_ebx = 1208382424, tf_edx = 134513381, ---Type <return> to continue, or q <return> to quit--- tf_ecx = 1208418370, tf_eax = 33, tf_trapno = 12, tf_err = 2, tf_eip = 12082991 75, tf_cs = 51, tf_eflags = 582, tf_esp = -1077950964, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:976 #24 0xc07f7bdf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #25 0x00000033 in ?? () Frame 15 and some of its variables might be interesting: (kgdb) frame 15 #15 0xc0790539 in ufs_lookup (ap=0xeb681b3c) at /usr/src/sys/ufs/ufs/ufs_lookup.c:571 571 error = VFS_VGET(pdp->v_mount, dp->i_ino, (kgdb) print /x flags $2 = 0x1000044 LOCKLEAF | FOLLOW | MPSAFE (kgdb) print *cnp $2 = {cn_nameiop = 0, cn_flags = 16777284, cn_thread = 0xc32b2180, cn_cred = 0xc3655000, cn_lkflags = 2, cn_pnbuf = 0xc25ed400 "/mnt/ports/editors/openoffice.org-2.0/work/solenv/unxfb sd.pro/lib/libc.so.6", cn_nameptr = 0xc25ed40b "editors/openoffice.org-2.0/work/solenv/unxfbsd.pro/li b/libc.so.6", cn_namelen = 7, cn_consume = 0} Process 44639 is dmake, and this is where things really start to get interesting: (kgdb) thread 99 [Switching to thread 99 (Thread 100082)]#0 0xc06551b7 in sched_switch ( td=0xc2761480, newtd=0xc32b2180, flags=1) at /usr/src/sys/kern/sched_4bsd.c:973 973 cpu_switch(td, newtd); (kgdb) where #0 0xc06551b7 in sched_switch (td=0xc2761480, newtd=0xc32b2180, flags=1) at /usr/src/sys/kern/sched_4bsd.c:973 #1 0xc064aa14 in mi_switch (flags=1, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:356 #2 0xc0662100 in sleepq_switch (wchan=0x0) at /usr/src/sys/kern/subr_sleepqueue.c:427 #3 0xc06622e4 in sleepq_wait (wchan=0xc44c5724) at /usr/src/sys/kern/subr_sleepqueue.c:539 #4 0xc064a685 in msleep (ident=0xc44c5724, mtx=0xc09477c4, priority=80, wmesg=0xc087909e "ufs", timo=0) at /usr/src/sys/kern/kern_synch.c:227 #5 0xc0639759 in acquire (lkpp=0xeb5b08f0, extflags=64, wanted=393216) at /usr/src/sys/kern/kern_lock.c:115 #6 0xc0639cba in lockmgr (lkp=0xc44c5724, flags=12290, interlkp=0x40, td=0xc2761480) at /usr/src/sys/kern/kern_lock.c:340 #7 0xc0690ccd in vop_stdlock (ap=0x0) at /usr/src/sys/kern/vfs_default.c:263 #8 0xc081744f in VOP_LOCK_APV (vop=0xc08e0420, a=0xeb5b0958) at vnode_if.c:1642 #9 0xc078ac24 in ffs_lock (ap=0xeb5b0958) at /usr/src/sys/ufs/ffs/ffs_vnops.c:341 #10 0xc081744f in VOP_LOCK_APV (vop=0xc0912ce0, a=0xeb5b0958) at vnode_if.c:1642 #11 0xc06a47e4 in vn_lock (vp=0xc44c56cc, flags=4098, td=0xc2761480) at vnode_if.h:844 ---Type <return> to continue, or q <return> to quit--- #12 0xc07904e3 in ufs_lookup (ap=0xeb5b0a7c) at /usr/src/sys/ufs/ufs/ufs_lookup.c:563 #13 0xc08155e3 in VOP_CACHEDLOOKUP_APV (vop=0xc0913220, a=0xeb5b0a7c) at vnode_if.c:150 #14 0xc068eafd in vfs_cache_lookup (ap=0x0) at vnode_if.h:82 #15 0xc081550f in VOP_LOOKUP_APV (vop=0xc0913220, a=0xeb5b0b14) at vnode_if.c:99 #16 0xc0692a9a in lookup (ndp=0xeb5b0ba0) at vnode_if.h:56 #17 0xc0692456 in namei (ndp=0xeb5b0ba0) at /usr/src/sys/kern/vfs_lookup.c:203 #18 0xc069ecd7 in kern_lstat (td=0xc2761480, path=0x0, pathseg=UIO_USERSPACE, sbp=0xeb5b0c74) at /usr/src/sys/kern/vfs_syscalls.c:2102 #19 0xc069ec73 in lstat (td=0xc2761480, uap=0xeb5b0d04) at /usr/src/sys/kern/vfs_syscalls.c:2086 #20 0xc08058df in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134600610, tf_esi = 13473007 6, tf_ebp = -1077951800, tf_isp = -346354332, tf_ebx = 1209300040, tf_edx = 0, t f_ecx = 0, tf_eax = 190, tf_trapno = 12, tf_err = 2, tf_eip = 1209192707, tf_cs = 51, tf_eflags = 658, tf_esp = -1077952004, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:976 #21 0xc07f7bdf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #22 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 12 #12 0xc07904e3 in ufs_lookup (ap=0xeb5b0a7c) at /usr/src/sys/ufs/ufs/ufs_lookup.c:563 563 vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td); (kgdb) print /x flags $3 = 0x100a004 LOCKLEAF | ISDOTDOT | ISLASTCN | MPSAFE (kgdb) print *cnp $1 = {cn_nameiop = 0, cn_flags = 16818180, cn_thread = 0xc2761480, cn_cred = 0xc2912680, cn_lkflags = 2, cn_pnbuf = 0xc25ed000 "../../../../../..", cn_nameptr = 0xc25ed00f "..", cn_namelen = 2, cn_consume = 0} Process 44849 has directory vnode 0xc44c56cc / inode 1224704 locked and is trying to lock vnode 0xc66fd6cc, and dmake (process 44639) has directory vnode 0xc66fd6cc / inode 1224721 locked and is trying to lock vnode 0xc44c56cc, hence the deadlock. But this isn't supposed to happen because ufs only locks directories starting from the root when doing lookups, and takes special care to get the lock order right when doing a lookup on "..". This is where things really get interesting. The dmake process is doing a ".." lookup and is calling vn_lock() here: (kgdb) frame 12 #12 0xc07904e3 in ufs_lookup (ap=0xeb5b0a7c) at /usr/src/sys/ufs/ufs/ufs_lookup.c:563 563 vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td); (kgdb) list 558 pdp = vdp; 559 if (flags & ISDOTDOT) { 560 VOP_UNLOCK(pdp, 0, td); /* race to get the inode */ 561 error = VFS_VGET(pdp->v_mount, dp->i_ino, 562 cnp->cn_lkflags, &tdp); 563 vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td); 564 if (error) 565 return (error); 566 *vpp = tdp; 567 } else if (dp->i_number == dp->i_ino) { It has unlocked the current directory in the path, it has called VFS_VGET() to lock the ".." directory (inode 1224721), and is attempted to relock the current directory, which is a child of the ".." directory. But, if we peek at the directory that dmake has locked ... fsdb -r /dev/da0s2a ** /dev/da0s2a (NO WRITE) Examining file system `/dev/da0s2a' Last Mounted on /mnt current inode: directory I=2 MODE=40755 SIZE=512 MTIME=Oct 11 02:10:20 2005 [0 nsec] CTIME=Oct 11 02:10:20 2005 [0 nsec] ATIME=Oct 13 12:16:02 2005 [0 nsec] OWNER=root GRP=wheel LINKCNT=8 FLAGS=0 BLKCNT=4 GEN=53557245 fsdb (inum: 2)> inode 1224721 current inode: directory I=1224721 MODE=40755 SIZE=4608 MTIME=Oct 5 17:56:57 2005 [0 nsec] CTIME=Oct 11 02:39:39 2005 [0 nsec] ATIME=Oct 13 12:16:02 2005 [0 nsec] OWNER=root GRP=wheel LINKCNT=233 FLAGS=0 BLKCNT=c GEN=33427862 fsdb (inum: 1224721)> ls slot 0 ino 1224721 reclen 12: directory, `.' slot 1 ino 1224704 reclen 12: directory, `..' slot 2 ino 125495 reclen 12: directory, `CVS' slot 3 ino 125496 reclen 16: directory, `tpad' [snip] it looks like we are trying to lock the parent of the directory that we have locked! The bug is that once we have unlocked pdp, another thread can do a lookup and overwrite dp->i_ino, so instead of getting the vnode for the ".." directory entry, VFS_VGET() will return the vnode for a subdirectory of the current directory, and when we relock the current directory we'll have a lock order reversal. Even if this doesn't result in a deadlock, it looks like it has the potential for mucking up lookups that involve "..". I also don't currently see a way for this to become a vnode lock leak. The fix is to preserve a copy of dp->d_ino before unlocking pdp, and pass the saved value to VFS_VGET(). Index: sys/ufs/ufs/ufs_lookup.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_lookup.c,v retrieving revision 1.77 diff -u -r1.77 ufs_lookup.c --- sys/ufs/ufs/ufs_lookup.c 13 Apr 2005 10:59:09 -0000 1.77 +++ sys/ufs/ufs/ufs_lookup.c 13 Oct 2005 23:20:59 -0000 _at__at_ -153,6 +153,7 _at__at_ int flags = cnp->cn_flags; int nameiop = cnp->cn_nameiop; struct thread *td = cnp->cn_thread; + u_int32_t saved_ino; bp = NULL; slotoffset = -1; _at__at_ -557,8 +558,9 _at__at_ */ pdp = vdp; if (flags & ISDOTDOT) { + saved_ino = dp->i_ino; VOP_UNLOCK(pdp, 0, td); /* race to get the inode */ - error = VFS_VGET(pdp->v_mount, dp->i_ino, + error = VFS_VGET(pdp->v_mount, saved_ino, cnp->cn_lkflags, &tdp); vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td); if (error)Received on Thu Oct 13 2005 - 21:25:35 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:45 UTC