NFS deadlock and status of nfs locking (rpc.lockd)

From: Martin Blapp <mb_at_imp.ch>
Date: Mon, 7 May 2007 17:00:23 +0200 (CEST)
Hi all,

We have 1-2 times per day a nfs deadlock on a busy 6.2 STABLE (1 week 
old) server, and we suspect rpc.lockd to be the problem. Unfortunalty we depend
on a working rpc.lockd :-( . The problems did not occour on a FreeBSD 5.4 
server, they just appeared after upgrading.

This is an excerpt from 'ps -auxwww' when the deadlock happened. But as I said,
we only supect that rpc.lockd is the real problem.

root     693  0.0  0.1  3248  2040  ??  Ss   11:08AM   0:00.05 rpc.lockd: serve     0     1   0  96  0 select
daemon   700  0.0  0.1  3200  1948  ??  I    11:08AM   0:00.00 rpc.lockd: clien     1   693  38   4  0 nfsloc
root     677  0.0  0.1  2968  1696  ??  Is   11:08AM   0:00.04 nfsd: master (nf     0     1   0  96  0 select
root     678  0.0  0.0  1324   716  ??  D    11:08AM   0:01.02 nfsd: server (nf     0   677   0  -4  0 ufs
root     679  0.0  0.0  1324   716  ??  D    11:08AM   0:00.12 nfsd: server (nf     0   677   0  -8  0 biord
root     680  0.0  0.0  1324   716  ??  D    11:08AM   0:00.15 nfsd: server (nf     0   677   0  -4  0 ufs
root     681  0.0  0.0  1324   716  ??  D    11:08AM   0:00.42 nfsd: server (nf     0   677   0  -4  0 ufs

The nfsd instances with 'ufs' are unkillable. Sometimes it helps to stop 
rpc.lockd and to restart it. The master nfsd process is unkillable too.

The server is a SMP machine, HTT enabled.

Now I have some questions:

- Can rpc.lockd be the underlying problem for such a nfsd hang ?

- Anybody of you knows a fix which hasn't already MFCd which could cause this ?

- Anything I could do to get more debugging informations ? Is turning on
   rpc.lockd debug information safe ? (run rpc.lockd with -d).

- Who is currently working on rpc.lockd ? What is the current status if I'd be
   interested to work on it.

- One instance of the exported file systems is mounted via iscsi. What happens
   if such a export is going away for some seconds, gets reconnected and then
   appears again. How are nfs timeouts handled in such a case ? Could that be
   the problem ? Unfortunatly we have seen such hangs with and without this
   particular filesystem mounted, but it happens definitly a lot more with the
   iscsi filesystem mounted.

--
Martin

Martin Blapp, <mb_at_imp.ch> <mbr_at_FreeBSD.org>
------------------------------------------------------------------
ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH
Phone: +41 61 826 93 00 Fax: +41 61 826 93 01
PGP: <finger -l mbr_at_freebsd.org>
PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E
------------------------------------------------------------------
Received on Mon May 07 2007 - 13:35:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:09 UTC