Re: Data corruption over NFS in -current

From: Dan Nelson <dnelson_at_allantgroup.com> Date: Thu, 12 Jan 2012 00:06:03 -0600 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC

In the last episode (Jan 11), Martin Cracauer said:
> Rick Macklem wrote on Wed, Jan 11, 2012 at 08:42:25PM -0500: 
> > Also, if you can reproduce the problem fairly easily, capture a packet
> > trace via
> > # tcpdump -s 0 -w xxx host <server> 
> > running on the client (or similar). Then email me "xxx" as an attachment
> > and I can look at it in wireshark.  (If you choose to look at it in
> > wireshark, I would suggest you look for Create RPCs to see if they are
> > Exclusive Creates, plus try and see where the data for the corrupt file
> > is written.)
> > 
> > Even if the capture is pretty large, it should be easy to find the
> > interesting part, so long as you know the name of the corrupt file and
> > search for that.
> 
> That's probably not practical, we are talking about hammering the NFS
> server with several CPU hours worth of parallel activity in a shellscript
> but I'll do my best :-)

The tcpdump options -C and -W can help here.  For example, -C 1000 -W 10
will keep the most recent 10-GB of traffic by circularly writing to 10 1-GB
capture files.  All you need to do is kill the tcpdump when you discover the
corruption, and work backwards through the logs until you find your file.

-- 
	Dan Nelson
	dnelson_at_allantgroup.com