Re: Data corruption over NFS in -current

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Thu, 19 Jan 2012 19:28:59 -0500 (EST)
Martin Cracauer wrote:
> More findings.
> 
> Reminder, with the original report I found:
> - files for no reason changing ownership and group to
> root/<owngroupname>
> - data corruption as in inserting binary junk obviously from ports
> - data corruption as in malformed ascii text that might be a bug I
> have in my code that is only exposed in FreeBSD
> 
> I ran the script on a Linux machine in the same situation again the
> same
> NFS server, it worked fine. I haven't look at blocksizes, NFS
> versions etc in play yet.
> 
> I ran with oldnfs (reboot), which showed only the third problem.
> 
> I re-ran with newfs (reboot) which worked (all three problems absent).
> 
> I then started building ports/land/gcc47 at the same time as I
> re-started my crazy script and it too only a few seconds for an
> unexpected ownership to root to occur.
> 
> My next steps are:
> - trying block sizes and other parameters, maybe use a different NFS
> version with the Linux client. My NFS server is newly upgraded to
> Linux kernel 3.1.5
> - running my script on a FreeBSD host with local disk to see whether
> problem #3 is a general problem that appears or is exposed only on
> FreeBSD
> - capture tcpdump as mentioned earlier
> 
> I will probably have to turn debug off since this script run is
> dominated by system time now and gets 10x slower as it is now.
> 
While poking around (partly related to this and partly related to
the NFSv4.1 pNFS client work), I came across an ugly bug in the
way the new NFS client handled "system operations". ("system operations"
are mainly NFSv4 Ops that manage state, such as Renew, which renews
a lease for the open/lock state. Another case of this was the NFSv3 statfs
when it did a Getattr because the server did not provide post operation
attributes in the reply.) It turns out that at least some Linux NFSv3
servers are in this category and the fact that Martin was doing a large
number of StatFS RPCs was indeed relevent.

Anyhow, the patch to fix the above seems to have resolved Martin's
problem. The patch is needed for the new NFS client if you are using
NFSv4 mounts or NFSv3 mounts against non-FreeBSD servers that don't
provide post-op attributes in the Statfs RPC reply. (FreeBSD servers
do provide post-op attributes, at least some Linux servers do not and
I don't know about others. You could check by capturing the packets
for a "df" and then looking at Statfs RPC reply in wireshark.) Without
the patch, there will be intermittent permission failures, since the
wrong credentials get used for an RPC.

The patch is here and should be in head soon:
   http://people.freebsd.org/~rmacklem/authcred.patch

Thanks go to Martin for pursuing this.

rick
Received on Thu Jan 19 2012 - 23:29:01 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC