Re: NFS issues since upgrading to 13-RELEASE

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Mon, 19 Apr 2021 15:03:35 +0000
Olav Gjerde wrote:
>I have tried D29690 patch and reverting back to r367492 this weekend. Neither made any difference for my system.
Just to clarify it, I meant "revert the patch in r367492" and not
"revert to revision r367492". I've attached the patch that
backs out the changes made by the patch in r367492, which
should apply to a fairly recent main/13 kernel.

This should be done instead of applying D29690, not combined with it.
My testing of D29690 has suggested it is not yet mature, so
I would not recommend choosing that alternative yet.

If you have tried a kernel with the attached patch applied to it, but not
D29690 applied to it, then please:
Let us know if you still have Linux clients "hanging" with this kernel.
If still "hanging", try the following to see if they help:
- Use the "minorversion=1" mount option on the Linux clients,
   to ensure that they are not using NFSv4.2, to see if it is a
   NFSv4.2 specific issue.
- Try disabling tso and lro and avoid jumbo frames for drivers
  that use jumbo mbufs when handling jumbo frames.
Collect the following info when it happens:
- "netstat -a", to see what the TCP connection is up to.
- "tcpdump -s 0 -w hang.pcap host <nfs-client>"
   run for several minutes on the server, to see what is going on the
   wire. I use wireshark to look at hang.pcap, since it
   knows NFS as well as TCP.
   You can also do the above with "host <nfs-server>" instead
   of "host <nfs-client>" run on the client.
- "ps axHl" on the server, to see what the nfsd threads
   are up to.
If none of the above contains confidential info, please
send it to me, if not the list.

Good luck with it, rick
ps: Yea, I started this post and then realized I had hit
     reply instead of reply all.



There is also a reddit thread about this https://www.reddit.com/r/freebsd/comments/mqol4o/nfs_issues_since_upgrading_to_13release/

On Sat, Apr 17, 2021 at 1:10 AM Rick Macklem <rmacklem_at_uoguelph.ca<mailto:rmacklem_at_uoguelph.ca>> wrote:
Just fyi, I just got a "recursed on non-recursed mutex" panic in
socantrcvmore() with the D29690 patch, so you might not
want to test with that one yet.

rick

________________________________________
From: owner-freebsd-current_at_freebsd.org<mailto:owner-freebsd-current_at_freebsd.org> <owner-freebsd-current_at_freebsd.org<mailto:owner-freebsd-current_at_freebsd.org>> on behalf of Olav Gjerde <olav_at_backupbay.com<mailto:olav_at_backupbay.com>>
Sent: Thursday, April 15, 2021 3:21 PM
To: Allan Jude
Cc: freebsd-current_at_freebsd.org<mailto:freebsd-current_at_freebsd.org>
Subject: Re: NFS issues since upgrading to 13-RELEASE

CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp_at_uoguelph.ca<mailto:IThelp_at_uoguelph.ca>


Well something do happen if I restart NFS Service on FreeBSD , it works for
like 10 seconds then it gets unresponsive again.

This is my output from `nfsstat -d 1`

0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
8.00  1025    8.00  8.02 17170  134.54  2.01 72716  142.54  0.07  51  34
8.00  2273   17.76  7.99 31273  244.07  2.01 133267  261.83  0.14  20  82
8.03  4889   38.33  7.99 25885  202.07  2.06 119340  240.40  0.13  21  81
[===== Read =====]  [===== Write ====]  [=========== Total ============]
KB/t   tps    MB/s  KB/t   tps    MB/s  KB/t   tps    MB/s    ms  ql  %b
7.98  8811   68.64  8.00 12997  101.54  2.22 78396  170.18  0.15   1  80
7.99   922    7.20  8.00  3798   29.68  2.10 17965   36.87  0.09   0  11
8.07  2959   23.31  0.00     0    0.00  2.67  8938   23.31  0.86  32  72
7.97  7088   55.18  0.00     0    0.00  2.66 21233   55.18  1.05  16  98
7.98  4666   36.38  0.00     0    0.00  2.66 13986   36.38  0.36   9  29
8.00  4513   35.24  8.00  7662   59.86  2.20 44188   95.10  0.27  10  49
7.98  4799   37.40  8.00 11422   89.23  2.16 60076  126.63  0.19   0  51
8.00  4322   33.76  0.00     0    0.00  2.67 12967   33.76  0.89   0  42
8.02  4839   37.91  0.00     0    0.00  2.67 14550   37.91  0.54  17  41
8.01  4516   35.32  0.00     0    0.00  2.67 13569   35.32  0.57  27  38
7.95  4459   34.62  8.00  1195    9.34  2.49 18109   43.96  0.55   0  45
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0
0.00     0    0.00  0.00     0    0.00  0.00     0    0.00  0.00   0   0



On Thu, Apr 15, 2021 at 9:07 PM Olav Gjerde <olav_at_backupbay.com<mailto:olav_at_backupbay.com>> wrote:

> I have the same issue, using Ubuntu 20.10 with Linux 5.8 kernel. The Linux
> NFS client will get unresponsive and it does not recover in my case, even
> if I restart NFS on FreeBSD. I upgraded from FreeBSD 12.1-RELEASE though.
>
> On Thu, Apr 15, 2021 at 8:36 PM Allan Jude <allanjude_at_freebsd.org<mailto:allanjude_at_freebsd.org>> wrote:
>
>> On 4/15/2021 9:22 AM, Chris Roose wrote:
>> > I posted this in -questions and someone suggested I post here as well.
>> >
>> > I'm having NFS availability issues between my Proxmox client and
>> FreeBSD server (10G link) since upgrading to 13-RELEASE. And unfortunately
>> I upgraded my ZFS pool to v2.0.0 before I noticed the issue, so I'm kind of
>> stuck.
>> >
>> > Periodically, the NFS server (I've tried both v3 and v4.2 clients) will
>> go unresponsive for several minutes. I never had this problem on 12.2, and
>> as far as I can tell it's not a disk or network I/O issue. I'll get several
>> "nfs: server not responding, still trying" messages on the client and a few
>> minutes later it usually recovers. It's not clear to me yet what's causing
>> the block. Restarting nfsd on the server will resolve the issue if it
>> doesn't clear itself.
>> >
>> > Any pointers for troubleshooting this? I've been looking through
>> vmstat, gstat, top, etc. when the problem occurs, but I haven't been able
>> to pinpoint the issue. I can get pcap, but it would be from the hosts,
>> because I don't have a 10G tap or managed switch.
>> >
>>
>> run `nfsstat -d 1` and try to capture a few lines from before, during,
>> and after the stall, and that may provide some insight.
>>
>> Specifically, does the queue length grow, suggesting it is waiting on
>> the I/O subsystem, or does it just stop getting traffic all together.
>>
>>
>> --
>> Allan Jude
>> _______________________________________________
>> freebsd-current_at_freebsd.org<mailto:freebsd-current_at_freebsd.org> mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org<mailto:freebsd-current-unsubscribe_at_freebsd.org>
>> "
>>
>
>
> --
> Kind Regards / Med Vennlig Hilsen
>
> Olav Grønås Gjerde
>
> BackupBay Gjerde
> Madlaforen 35
> 4042 HAFRSFJORD
> Norway
> Phone: +47 918 000 59
>


--
Kind Regards / Med Vennlig Hilsen

Olav Grønås Gjerde

BackupBay Gjerde
Madlaforen 35
4042 HAFRSFJORD
Norway
Phone: +47 918 000 59
_______________________________________________
freebsd-current_at_freebsd.org<mailto:freebsd-current_at_freebsd.org> mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org<mailto:freebsd-current-unsubscribe_at_freebsd.org>"


--
Kind Regards / Med Vennlig Hilsen

Olav Grønås Gjerde

BackupBay Gjerde
Madlaforen 35
4042 HAFRSFJORD
Norway
Phone: +47 918 000 59

Received on Mon Apr 19 2021 - 13:03:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:28 UTC