Re: link() not increasing link count on NFS server

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Thu, 15 Nov 2007 12:39:22 +0000 (GMT)
On Thu, 15 Nov 2007, Adam McDougall wrote:

> Hi, lately I've been trying to work out some NFS multiple access issues 
> relating to the Dovecot IMAP server software.  One symptom seems to be an 
> unusual behavior of FreeBSD NFS clients that I cannot reproduce with Linux 
> or Solaris NFS clients. Basically, Timo (cc'ed) came up with a small test 
> case that seems to indicate sometimes a link() call can succeed while the 
> link count of the file will not increase.  If this is ran on two FreeBSD 
> clients from the same NFS directory, you will occasionally see "link() 
> succeeded, but link count=1".  I've tried both a Netapp and a FreeBSD NFS 
> server.  I've tried FreeBSD 7_RELENG clients as well as FreeBSD 6.2-stable 
> from this summer.  I've ran it on 32bit and 64bit clients. I've turned 
> rpc.lockd on and off, tried tcp vs. udp mounts, nothing so far seems to make 
> a difference, except perhaps FreeBSD 7.0 seems to produce the error less 
> often.  If one of the processes is ran on a non-FreeBSD NFS cliemt, only the 
> FreeBSD NFS client gives the link error.  Anyone have any input?  Thanks.

The usual next step in debugging an NFS client problem, if you have managed to 
identify a nice test case, is to analyze the wire RPCs to see what's actually 
going on.  In this case, using NFS over UDP is actually a bit easier to deal 
with.  Wireshark has an excellent NFS RPC decoder, so if you grap the packets 
directly with Wireshark, or with tcpdump and then load then in Wireshark, it 
may shed some light.  Ideally, we'd get the test case down to maybe four to 
eight RPCs and their replies -- a GETATTR at the start (stat the file to check 
the link count), LINK and its reply, and a GETATTR at the end (stat the file 
to check the link count).  You will probably enter up with a smattering of 
LOOKUP and possibly ACCESS calls mixed in.

My guess, and this is just a hand-wave, is that the attribute cache in the NFS 
client isn't being forced to refresh, and hence you're getting the old stat 
data back (and perhaps there's no GETATTR on the wire, which might hint at 
this).  If you'd like, you can post a link to the pcap capture file and one of 
us can take a look, but I've found NFS RPCs to be surprisingly readable in 
Wireshark so you might find it sheds quite a bit of light.

I assume, btw, that if you stat the file directly on the server, or from 
another client, both links show the right link count?

Robert N M Watson
Computer Laboratory
University of Cambridge

>
>
> How to reproduce (local binary is fine too, may be required if different arch):
> ------------------
>
> cp locktest.c /nfsserver
> cd /nfsserver
> gcc locktest.c -o locktest -Wall -g
>
> On host 1:
> cd /nfsserver
> ./locktest temp1
>
> On host 2: (easiest to reproduce when starting just a few seconds after 1)
> cd /nfsserver
> ./locktest temp2
>
>
> Typical output (timing may vary):
> ----------------------------------
>
> Host 1:
>
>> /tmp/locktest temp1
> 5 successes
> 15 successes
> unlink(): No such file or directory         (not a problem indication, happens
> 19 successes                                 when second process starts)
> 20 successes
> link() succeeded, but link count=1
> 20 successes
> link() succeeded, but link count=1
> 20 successes
> 33 successes
> 33 successes
> link() succeeded, but link count=1
> 33 successes
> 45 successes
> link() succeeded, but link count=1
> 45 successes
> 45 successes
> link() succeeded, but link count=1
> ^C
>
> Host 2:
>
>> /tmp/locktest temp2
> 6 successes
> 15 successes
> 25 successes
> 38 successes
> 39 successes
> 50 successes
> 59 successes
> link() succeeded, but link count=1
> 59 successes
> 69 successes
> 79 successes
> 91 successes
> 99 successes
> 109 successes
> ^C
>
>
Received on Thu Nov 15 2007 - 11:39:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:22 UTC