Re: NFSv4 performance degradation with 12.0-CURRENT client

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Thu, 24 Nov 2016 12:53:43 +0000
On Wed, Nov 23, 2016 at 10:17:25PM -0700, Alan Somers wrote:
> I have a FreeBSD 10.3-RELEASE-p12 server exporting its home
> directories over both NFSv3 and NFSv4.  I have a TrueOS client (based
> on 12.0-CURRENT on the drm-next-4.7 branch, built on 28-October)
> mounting the home directories over NFSv4.  At first, everything is
> fine and performance is good.  But if the client does a buildworld
> using sources on NFS and locally stored objects, performance slowly
> degrades.  The degradation is most noticeable with metadata-heavy
> operations.  For example, "ls -l" in a directory with 153 files takes
> less than 0.1 seconds right after booting.  But the longer the
> buildworld goes on, the slower it gets.  Eventually that same "ls -l"
> takes 19 seconds.  When the home directories are mounted over NFSv3
> instead, I see no degradation.
>
> top shows negligible CPU consumption on the server, and very high
> consumption on the client when using NFSv4 (nearly 100%).  The
> NFS-using process is spending almost all of its time in system mode,
> and dtrace shows that almost all of its time is spent in
> ncl_getpages().
>
A couple of things you could do when it slow (as well as what Kostik suggested):
- nfsstat -c -e on client and nfsstat -e -s on server, to see what RPCs are being done
  and how quickly. (nfsstat -s -e will also show you how big the DRC is, although a
  large DRC should show up as increased CPU consumption on the server)
- capture packets with tcpdump -s 0 -w test.pcap host <other-one>
  - then you can email me test.pcap as an attachment. I can look at it in wireshark
    and see if there seem to protocol and/or TCP issues. (You can look at in wireshark
    yourself, the look for NFS4ERR_xxx, TCP segment retransmits...)

If you are using either "intr" or "soft" on the mounts, try without those mount options.
(The Bugs section of mount_nfs recommends against using them. If an RPC fails due to
 these options, something called a seqid# can be "out of sync" between client/server and
 that causes serious problems.)
--> These seqid#s are not used by NFSv4.1, so you could try that by adding
      "minorversion=1" to your mount options.

Good luck with it, rick
Received on Thu Nov 24 2016 - 11:53:47 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC