Re: vlruwk deadlock (was: NFS troubles on recent -current.)

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Fri, 16 May 2003 23:57:58 -0700 (PDT)
On 16 May, Robert Watson wrote:
> 
> On Fri, 16 May 2003, Bernd Walter wrote:
> 
>> I had an io deadlook on the same server today while doing a make release
>> on an alpha nfs client.  I don't know if it is related to the given
>> patch or not.  It happened shortly after the client checked out the
>> ports.  Sorry - I forgot to take a dump. 
> 
> I've been running into similar sorts of deadlocks on my diskless crash
> boxes, and have dropped some information to Don on them.  Try the
> following also from ddb: 
> 
> print numvnodes
> print desiredvnodes
> print vnlruproc_sig
> print vnlru_nowhere
> 
> This will print some information on the number of active vnodes; one of
> the characteristics of my nfs client/server box in its deadlocked state is
> that it has exceeded the maximum number of vnodes, presumably by
> necessity.

I've been unable to reproduce this problem so far in my environment.
Since I've only got one box running -current, I've been doing testing by
NFS mounting the local file system back to itself.

I've run
	make -j10 buildworld
	make DESTDIR=/mnt installworld
where both /usr/obj and /mnt were NFS mount points.

I also just tried NFS mounting / on /mnt and running simultaneous
	find -x . -type f -print0 | xargs -0 cat >/dev/null
on both / and /mnt.  While this was running, which took well over an
hour, I monitored the vnode-related sysctl variables.  The value of
vfs.freevnodes had an interesting oscillitory behaviour.  There was a
short-period oscillation of about 3000-4000, and a long term oscillation
of 30000-40000.  It got as low as a few hundred.  I didn't see any sign
of vnode reference leaks on the client side and was able to umount the
NFS file system without error.  Here are the final sysctl values:

kern.maxvnodes: 70112
kern.minvnodes: 17528
vfs.numvnodes: 63719
vfs.wantfreevnodes: 25
vfs.freevnodes: 55379
debug.vnlru_nowhere: 0

and gdb -k shows that vnlruproc_sig is also 0.

BTW, the first time I tried this, I left off the -x option to find and
got this vnode lock assertion failure.

VOP_UNLOCK: 0xc748c000 is not locked but should be
Debugger("Lock violation.
")
Stopped at      Debugger+0x54:  xchgl   %ebx,in_Debugger.0
db> tr
Debugger(c0516f98,c051716f,c748c000,c0516fd9,e6e02988) at Debugger+0x54
vfs_badlock(c0516fd9,c051716f,c748c000,c0582880,c748c000) at vfs_badlock+0x45
assert_vop_locked(c748c000,c051716f,c748c000,e6e029fc,c02dabed) at assert_vop_lo
cked+0x62
vop_unlock_pre(e6e029dc,e6e02bec,c6195b00,186a0,e6e029c0) at vop_unlock_pre+0x38
pfs_lookup(e6e02a38,c0523d3b,c68e7d10,e6e02a38,c68e7d10) at pfs_lookup+0x2ed
lookup(e6e02bd8,0,c0516a45,a4,c68e7d10) at lookup+0x366
namei(e6e02bd8,e6e02ae4,c0316d5d,c05ae8c0,1) at namei+0x24e
vn_open_cred(e6e02bd8,e6e02cd8,0,c671b000,e6e02cc4) at vn_open_cred+0x237
vn_open(e6e02bd8,e6e02cd8,0,2ab,c034476b) at vn_open+0x29
kern_open(c68e7d10,4812785b,0,1,0) at kern_open+0x13a
open(c68e7d10,e6e02d10,c052a91d,3fb,3) at open+0x30
syscall(2f,2f,2f,ffffffff,8055a00) at syscall+0x26e
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (5, FreeBSD ELF32, open), eip = 0x480bc6f3, esp = 0xbfbff9fc, ebp = 
0xbfbffa98 ---


The seems to be the call to VOP_UNLOCK() at line 415 in
pseudofs_vnops.c.  The vnode should be locked at this point unless this
the ISDOTDOT "if" block is getting triggered, which unlocks the vnode
and doesn't relock it before jumping to the code which may try to unlock
the vnode a second time.
Received on Fri May 16 2003 - 21:58:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:08 UTC