Re: 6.0 hangs (while building OOo)

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Thu, 13 Oct 2005 16:25:11 -0700 (PDT)
On 13 Oct, John Baldwin wrote:
> On Tuesday 11 October 2005 05:43 am, Don Lewis wrote:
>> On  6 Oct, Kris Kennaway wrote:

[ snip ]

>> >> This problem appears to be some sort of vnode lock leak.
>> >
>> > leaked lockmgr locks usually panic when the thread exits.
>>
>> There's a KASSERT to in userret() to catch this in the syscall return
>> path, but that check is only enabled if the INVARIANTS kernel option is
>> enabled,

[ snip ]

>> The code referenced by the vnode filename and line members is this block
>> in ufs_lookup():
>>
>>         if (flags & ISDOTDOT) {
>>                 VOP_UNLOCK(pdp, 0, td); /* race to get the inode */
>>                 error = VFS_VGET(pdp->v_mount, dp->i_ino,
>>                     cnp->cn_lkflags, &tdp);
>> -->             vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td);
>>                 if (error)
>>                         return (error);
>>                 *vpp = tdp;
>> 	} else ...
>>
>>
>> I don't see any obvious lock leaks in the code.
> 
> Looks like the function returns with the desired leaf vnode locked if I 
> grokked it correctly in my brief perusal, so the bug could be in the caller.

I think I might have found *the* bug, but if even if it is not the cause
of the lock leak, I did find *a* bug.  I just had my latest build hang.
It didn't panic in userret(), dmake and another process both deadlocked
waiting for vnode locks.

Here's the list of locked vnodes:

db> show lockedvnod
Locked vnodes

0xc44c56cc: tag ufs, type VDIR
    usecount 2, writecount 0, refcount 3 mountedhere 0
    flags ()
    v_object 0xc4a61b58 ref 0 pages 1
     lock type ufs: EXCL (count 1) by thread 0xc32b2180 (pid 44849) with 1 pendi
ng#0 0xc0639d5e at lockmgr+0x566
#1 0xc0690ccd at vop_stdlock+0x21
#2 0xc081744f at VOP_LOCK_APV+0x87
#3 0xc078ac24 at ffs_lock+0x10
#4 0xc081744f at VOP_LOCK_APV+0x87
#5 0xc06a47e4 at vn_lock+0xac
#6 0xc0698b52 at vget+0x8e
#7 0xc06919e1 at vfs_hash_get+0x8d
#8 0xc0789adf at ffs_vget+0x27
#9 0xc0790539 at ufs_lookup+0xab9
#10 0xc08155e3 at VOP_CACHEDLOOKUP_APV+0x9b
#11 0xc068eafd at vfs_cache_lookup+0xb5
#12 0xc081550f at VOP_LOOKUP_APV+0x87
#13 0xc0692a9a at lookup+0x3e2
#14 0xc0692456 at namei+0x37e
#15 0xc069e922 at kern_access+0x6a
#16 0xc069e8b5 at access+0x15
#17 0xc08058df at syscall+0x22f

        ino 1224704, on dev da0s2a

0xc66fd6cc: tag ufs, type VDIR
    usecount 2, writecount 0, refcount 3 mountedhere 0
    flags ()
    v_object 0xc4a59108 ref 0 pages 2
     lock type ufs: EXCL (count 1) by thread 0xc2761480 (pid 44639) with 1 pendi
ng#0 0xc0639d5e at lockmgr+0x566
#1 0xc0690ccd at vop_stdlock+0x21
#2 0xc081744f at VOP_LOCK_APV+0x87
#3 0xc078ac24 at ffs_lock+0x10
#4 0xc081744f at VOP_LOCK_APV+0x87
#5 0xc06a47e4 at vn_lock+0xac
#6 0xc0698b52 at vget+0x8e
#7 0xc06919e1 at vfs_hash_get+0x8d
#8 0xc0789adf at ffs_vget+0x27
#9 0xc07904ce at ufs_lookup+0xa4e
#10 0xc08155e3 at VOP_CACHEDLOOKUP_APV+0x9b
#11 0xc068eafd at vfs_cache_lookup+0xb5
#12 0xc081550f at VOP_LOOKUP_APV+0x87
#13 0xc0692a9a at lookup+0x3e2
#14 0xc0692456 at namei+0x37e
#15 0xc069ecd7 at kern_lstat+0x47
#16 0xc069ec73 at lstat+0x1b
#17 0xc08058df at syscall+0x22f

        ino 1224721, on dev da0s2a


This is the stack trace for process 44849:

(kgdb) where
#0  0xc06551b7 in sched_switch (td=0xc32b2180, newtd=0xc2373180, flags=1)
    at /usr/src/sys/kern/sched_4bsd.c:973
#1  0xc064aa14 in mi_switch (flags=1, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:356
#2  0xc0662100 in sleepq_switch (wchan=0x0)
    at /usr/src/sys/kern/subr_sleepqueue.c:427
#3  0xc06622e4 in sleepq_wait (wchan=0xc66fd724)
    at /usr/src/sys/kern/subr_sleepqueue.c:539
#4  0xc064a685 in msleep (ident=0xc66fd724, mtx=0xc09477e8, priority=80,
    wmesg=0xc087909e "ufs", timo=0) at /usr/src/sys/kern/kern_synch.c:227
#5  0xc0639759 in acquire (lkpp=0xeb6818d8, extflags=64, wanted=393216)
    at /usr/src/sys/kern/kern_lock.c:115
#6  0xc0639cba in lockmgr (lkp=0xc66fd724, flags=8194, interlkp=0x40,
    td=0xc32b2180) at /usr/src/sys/kern/kern_lock.c:340
#7  0xc0690ccd in vop_stdlock (ap=0x0) at /usr/src/sys/kern/vfs_default.c:263
#8  0xc081744f in VOP_LOCK_APV (vop=0xc08e0420, a=0xeb681940)
    at vnode_if.c:1642
#9  0xc078ac24 in ffs_lock (ap=0xeb681940)
    at /usr/src/sys/ufs/ffs/ffs_vnops.c:341  
#10 0xc081744f in VOP_LOCK_APV (vop=0xc0912ce0, a=0xeb681940)
    at vnode_if.c:1642
#11 0xc06a47e4 in vn_lock (vp=0xc66fd6cc, flags=8194, td=0xc32b2180)
    at vnode_if.h:844
---Type <return> to continue, or q <return> to quit---
#12 0xc0698b52 in vget (vp=0xc66fd6cc, flags=8194, td=0xc32b2180)
    at /usr/src/sys/kern/vfs_subr.c:1953
#13 0xc06919e1 in vfs_hash_get (mp=0xc25af800, hash=1224721, flags=2,
    td=0xc32b2180, vpp=0xeb681a94, fn=0, arg=0x0)
    at /usr/src/sys/kern/vfs_hash.c:81
#14 0xc0789adf in ffs_vget (mp=0xc25af800, ino=1224721, flags=2,
    vpp=0xeb681a94) at pcpu.h:162
#15 0xc0790539 in ufs_lookup (ap=0xeb681b3c)
    at /usr/src/sys/ufs/ufs/ufs_lookup.c:571
#16 0xc08155e3 in VOP_CACHEDLOOKUP_APV (vop=0xc0913220, a=0xeb681b3c)
    at vnode_if.c:150
#17 0xc068eafd in vfs_cache_lookup (ap=0x0) at vnode_if.h:82
#18 0xc081550f in VOP_LOOKUP_APV (vop=0xc0913220, a=0xeb681bd4)
    at vnode_if.c:99
#19 0xc0692a9a in lookup (ndp=0xeb681c68) at vnode_if.h:56
#20 0xc0692456 in namei (ndp=0xeb681c68) at /usr/src/sys/kern/vfs_lookup.c:203
#21 0xc069e922 in kern_access (td=0xc32b2180, path=0x0, pathseg=UIO_USERSPACE,
    flags=0) at /usr/src/sys/kern/vfs_syscalls.c:1879  
#22 0xc069e8b5 in access (td=0xc32b2180, uap=0x0)
    at /usr/src/sys/kern/vfs_syscalls.c:1856
#23 0xc08058df in syscall (frame=
      {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 65, tf_esi = 1208418304, tf_
ebp = -1077950920, tf_isp = -345498268, tf_ebx = 1208382424, tf_edx = 134513381,
---Type <return> to continue, or q <return> to quit---
 tf_ecx = 1208418370, tf_eax = 33, tf_trapno = 12, tf_err = 2, tf_eip = 12082991
75, tf_cs = 51, tf_eflags = 582, tf_esp = -1077950964, tf_ss = 59})
    at /usr/src/sys/i386/i386/trap.c:976
#24 0xc07f7bdf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#25 0x00000033 in ?? ()

Frame 15 and some of its variables might be interesting:

(kgdb) frame 15
#15 0xc0790539 in ufs_lookup (ap=0xeb681b3c)
    at /usr/src/sys/ufs/ufs/ufs_lookup.c:571
571                     error = VFS_VGET(pdp->v_mount, dp->i_ino,
(kgdb) print /x flags
$2 = 0x1000044
LOCKLEAF | FOLLOW | MPSAFE
(kgdb) print *cnp
$2 = {cn_nameiop = 0, cn_flags = 16777284, cn_thread = 0xc32b2180,
  cn_cred = 0xc3655000, cn_lkflags = 2,
  cn_pnbuf = 0xc25ed400 "/mnt/ports/editors/openoffice.org-2.0/work/solenv/unxfb
sd.pro/lib/libc.so.6",
  cn_nameptr = 0xc25ed40b "editors/openoffice.org-2.0/work/solenv/unxfbsd.pro/li
b/libc.so.6", cn_namelen = 7, cn_consume = 0}


Process 44639 is dmake, and this is where things really start to get
interesting:

(kgdb) thread 99
[Switching to thread 99 (Thread 100082)]#0  0xc06551b7 in sched_switch (
    td=0xc2761480, newtd=0xc32b2180, flags=1)
    at /usr/src/sys/kern/sched_4bsd.c:973   
973                     cpu_switch(td, newtd);
(kgdb) where
#0  0xc06551b7 in sched_switch (td=0xc2761480, newtd=0xc32b2180, flags=1)
    at /usr/src/sys/kern/sched_4bsd.c:973
#1  0xc064aa14 in mi_switch (flags=1, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:356
#2  0xc0662100 in sleepq_switch (wchan=0x0)
    at /usr/src/sys/kern/subr_sleepqueue.c:427
#3  0xc06622e4 in sleepq_wait (wchan=0xc44c5724)
    at /usr/src/sys/kern/subr_sleepqueue.c:539
#4  0xc064a685 in msleep (ident=0xc44c5724, mtx=0xc09477c4, priority=80,
    wmesg=0xc087909e "ufs", timo=0) at /usr/src/sys/kern/kern_synch.c:227
#5  0xc0639759 in acquire (lkpp=0xeb5b08f0, extflags=64, wanted=393216)
    at /usr/src/sys/kern/kern_lock.c:115
#6  0xc0639cba in lockmgr (lkp=0xc44c5724, flags=12290, interlkp=0x40,
    td=0xc2761480) at /usr/src/sys/kern/kern_lock.c:340
#7  0xc0690ccd in vop_stdlock (ap=0x0) at /usr/src/sys/kern/vfs_default.c:263
#8  0xc081744f in VOP_LOCK_APV (vop=0xc08e0420, a=0xeb5b0958)
    at vnode_if.c:1642
#9  0xc078ac24 in ffs_lock (ap=0xeb5b0958)
    at /usr/src/sys/ufs/ffs/ffs_vnops.c:341
#10 0xc081744f in VOP_LOCK_APV (vop=0xc0912ce0, a=0xeb5b0958)
    at vnode_if.c:1642
#11 0xc06a47e4 in vn_lock (vp=0xc44c56cc, flags=4098, td=0xc2761480)
    at vnode_if.h:844
---Type <return> to continue, or q <return> to quit---
#12 0xc07904e3 in ufs_lookup (ap=0xeb5b0a7c)
    at /usr/src/sys/ufs/ufs/ufs_lookup.c:563
#13 0xc08155e3 in VOP_CACHEDLOOKUP_APV (vop=0xc0913220, a=0xeb5b0a7c)
    at vnode_if.c:150
#14 0xc068eafd in vfs_cache_lookup (ap=0x0) at vnode_if.h:82
#15 0xc081550f in VOP_LOOKUP_APV (vop=0xc0913220, a=0xeb5b0b14)
    at vnode_if.c:99
#16 0xc0692a9a in lookup (ndp=0xeb5b0ba0) at vnode_if.h:56
#17 0xc0692456 in namei (ndp=0xeb5b0ba0) at /usr/src/sys/kern/vfs_lookup.c:203
#18 0xc069ecd7 in kern_lstat (td=0xc2761480, path=0x0, pathseg=UIO_USERSPACE,
    sbp=0xeb5b0c74) at /usr/src/sys/kern/vfs_syscalls.c:2102
#19 0xc069ec73 in lstat (td=0xc2761480, uap=0xeb5b0d04)
    at /usr/src/sys/kern/vfs_syscalls.c:2086
#20 0xc08058df in syscall (frame=
      {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134600610, tf_esi = 13473007
6, tf_ebp = -1077951800, tf_isp = -346354332, tf_ebx = 1209300040, tf_edx = 0, t
f_ecx = 0, tf_eax = 190, tf_trapno = 12, tf_err = 2, tf_eip = 1209192707, tf_cs
= 51, tf_eflags = 658, tf_esp = -1077952004, tf_ss = 59})
    at /usr/src/sys/i386/i386/trap.c:976
#21 0xc07f7bdf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#22 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 12
#12 0xc07904e3 in ufs_lookup (ap=0xeb5b0a7c)
    at /usr/src/sys/ufs/ufs/ufs_lookup.c:563
563                     vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td);
(kgdb) print /x flags
$3 = 0x100a004
LOCKLEAF | ISDOTDOT | ISLASTCN | MPSAFE
(kgdb) print *cnp
$1 = {cn_nameiop = 0, cn_flags = 16818180, cn_thread = 0xc2761480,
  cn_cred = 0xc2912680, cn_lkflags = 2,
  cn_pnbuf = 0xc25ed000 "../../../../../..", cn_nameptr = 0xc25ed00f "..",
  cn_namelen = 2, cn_consume = 0}


Process 44849 has directory vnode 0xc44c56cc / inode 1224704 locked and
is trying to lock vnode 0xc66fd6cc, and dmake (process 44639) has
directory vnode 0xc66fd6cc / inode 1224721 locked and is trying to lock
vnode 0xc44c56cc, hence the deadlock.  But this isn't supposed to happen
because ufs only locks directories starting from the root when doing
lookups, and takes special care to get the lock order right when doing a
lookup on "..".

This is where things really get interesting.  The dmake process is doing
a ".." lookup and is calling vn_lock() here:

(kgdb) frame 12
#12 0xc07904e3 in ufs_lookup (ap=0xeb5b0a7c)
    at /usr/src/sys/ufs/ufs/ufs_lookup.c:563
563                     vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td);
(kgdb) list
558             pdp = vdp;
559             if (flags & ISDOTDOT) {
560                     VOP_UNLOCK(pdp, 0, td); /* race to get the inode */
561                     error = VFS_VGET(pdp->v_mount, dp->i_ino,
562                         cnp->cn_lkflags, &tdp);
563                     vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td);
564                     if (error)
565                             return (error);
566                     *vpp = tdp;
567             } else if (dp->i_number == dp->i_ino) {

It has unlocked the current directory in the path, it has called
VFS_VGET() to lock the ".." directory (inode 1224721), and is attempted
to relock the current directory, which is a child of the ".." directory.
But, if we peek at the directory that dmake has locked ...

fsdb -r /dev/da0s2a
** /dev/da0s2a (NO WRITE)
Examining file system `/dev/da0s2a'
Last Mounted on /mnt
current inode: directory
I=2 MODE=40755 SIZE=512
        MTIME=Oct 11 02:10:20 2005 [0 nsec]
        CTIME=Oct 11 02:10:20 2005 [0 nsec]
        ATIME=Oct 13 12:16:02 2005 [0 nsec]
OWNER=root GRP=wheel LINKCNT=8 FLAGS=0 BLKCNT=4 GEN=53557245
fsdb (inum: 2)> inode 1224721
current inode: directory
I=1224721 MODE=40755 SIZE=4608
        MTIME=Oct  5 17:56:57 2005 [0 nsec]
        CTIME=Oct 11 02:39:39 2005 [0 nsec]
        ATIME=Oct 13 12:16:02 2005 [0 nsec]
OWNER=root GRP=wheel LINKCNT=233 FLAGS=0 BLKCNT=c GEN=33427862
fsdb (inum: 1224721)> ls
slot 0 ino 1224721 reclen 12: directory, `.'
slot 1 ino 1224704 reclen 12: directory, `..'
slot 2 ino 125495 reclen 12: directory, `CVS'
slot 3 ino 125496 reclen 16: directory, `tpad'
[snip]

it looks like we are trying to lock the parent of the directory that we
have locked!

The bug is that once we have unlocked pdp, another thread can do a
lookup and overwrite dp->i_ino, so instead of getting the vnode for the
".." directory entry, VFS_VGET() will return the vnode for a
subdirectory of the current directory, and when we relock the current
directory we'll have a lock order reversal.

Even if this doesn't result in a deadlock, it looks like it has the
potential for mucking up lookups that involve "..".  I also don't
currently see a way for this to become a vnode lock leak.

The fix is to preserve a copy of dp->d_ino before unlocking pdp,
and pass the saved value to VFS_VGET().

Index: sys/ufs/ufs/ufs_lookup.c
===================================================================
RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_lookup.c,v
retrieving revision 1.77
diff -u -r1.77 ufs_lookup.c
--- sys/ufs/ufs/ufs_lookup.c	13 Apr 2005 10:59:09 -0000	1.77
+++ sys/ufs/ufs/ufs_lookup.c	13 Oct 2005 23:20:59 -0000
_at__at_ -153,6 +153,7 _at__at_
 	int flags = cnp->cn_flags;
 	int nameiop = cnp->cn_nameiop;
 	struct thread *td = cnp->cn_thread;
+	u_int32_t saved_ino;
 
 	bp = NULL;
 	slotoffset = -1;
_at__at_ -557,8 +558,9 _at__at_
 	 */
 	pdp = vdp;
 	if (flags & ISDOTDOT) {
+		saved_ino = dp->i_ino;
 		VOP_UNLOCK(pdp, 0, td);	/* race to get the inode */
-		error = VFS_VGET(pdp->v_mount, dp->i_ino,
+		error = VFS_VGET(pdp->v_mount, saved_ino,
 		    cnp->cn_lkflags, &tdp);
 		vn_lock(pdp, LK_EXCLUSIVE | LK_RETRY, td);
 		if (error)
Received on Thu Oct 13 2005 - 21:25:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:45 UTC