Re: FreeBSD 6.0 TCP NFS client stuck

From: Mohan Srinivasan <mohan_srinivasan_at_yahoo.com>
Date: Mon, 7 Nov 2005 23:40:11 -0800 (PST)
Great.

I'll commit the fix to current shortly and it will get
MFC'ed to the 6.x branch soon.

thanks !

mohan

--- TOMITA Yoshinori <yoshint_at_flab.fujitsu.co.jp> wrote:

> Thanks, Mohan.
> 
> With your patch, it seems working fine.
> 
> I could confirm FreeBSD6 NFS client will create a new TCP connection
> and old session was cleaned up.
> 
> 
> [tcpdump log]
> 
> 15:36:48.138599 IP CLIENT.3523215360 > SERVER.nfs: 40 null
> 15:36:48.138832 IP SERVER.nfs > CLIENT.3523215360: reply ok 24 null
> 15:36:49.188099 IP SERVER.nfsd > CLIENT.706: F 86953281:86953281(0) ack 61240098 win 10136
> <nop,nop,timestamp 797285900 125036>
> 15:36:49.188169 IP CLIENT.706 > SERVER.nfsd: . ack 1 win 65535 <nop,nop,timestamp 484984
> 797285900>
> 
>   *closed*
> 
> 15:37:18.148158 IP CLIENT.3791650816 > SERVER.nfs: 40 null
> 15:37:18.148393 IP SERVER.nfs > CLIENT.3791650816: reply ok 24 null
> 15:37:48.157804 IP CLIENT.4060086272 > SERVER.nfs: 40 null
> 15:37:48.158054 IP SERVER.nfs > CLIENT.4060086272: reply ok 24 null
> 
>   *try NFS access*
> 
> 15:38:02.734963 IP CLIENT.2121656486 > SERVER.nfs: 132 access [|nfs]
> 15:38:02.735001 IP CLIENT.706 > SERVER.nfsd: F 133:133(0) ack 1 win 65535 <nop,nop,timestamp
> 558510 797285900>
> 15:38:02.735127 IP CLIENT.3946253056 > SERVER.nfs: 0 proc-215011
> 15:38:02.735211 IP SERVER.nfsd > CLIENT.706: . ack 134 win 10136 <nop,nop,timestamp 797293252
> 558509>
> 15:38:02.735354 IP SERVER.nfs > CLIENT.3560442624: reply ERR 0
> 15:38:02.735444 IP CLIENT.980 > SERVER.nfsd: . ack 169844227 win 33304 <nop,nop,timestamp 558510
> 797293252>
> 15:38:02.735508 IP CLIENT.2121656486 > SERVER.nfs: 132 access [|nfs]
> 15:38:02.735698 IP SERVER.nfsd > CLIENT.980: . ack 132 win 10004 <nop,nop,timestamp 797293252
> 558510>
> 15:38:02.737331 IP SERVER.nfs > CLIENT.2121656486: reply ok 124 access [|nfs]
> 15:38:02.739255 IP CLIENT.2121656493 > SERVER.nfs: 132 access [|nfs]
> 15:38:02.739477 IP SERVER.nfsd > CLIENT.980: . ack 264 win 10136 <nop,nop,timestamp 797293252
> 558514>
> 15:38:02.739612 IP SERVER.nfs > CLIENT.2121656493: reply ok 124 access [|nfs]
> 15:38:02.739971 IP CLIENT.2121656494 > SERVER.nfs: 128 fsstat [|nfs]
> 15:38:02.740322 IP SERVER.nfs > CLIENT.2121656494: reply ok 172 fsstat [|nfs]
> 15:38:02.740553 IP CLIENT.2121656495 > SERVER.nfs: 148 readdir [|nfs]
> 15:38:02.741950 IP SERVER.nfs > CLIENT.2121656495: reply ok 1448 readdir [|nfs]
> 15:38:02.742072 IP SERVER.nfs > CLIENT.0: reply ERR 1448
> 15:38:02.742132 IP CLIENT.980 > SERVER.nfsd: . ack 3317 win 64906 <nop,nop,timestamp 558517
> 797293252>
> 15:38:02.742192 IP SERVER.nfs > CLIENT.1: reply ERR 1448
> 15:38:02.742315 IP SERVER.nfs > CLIENT.2316: reply ok 1448
> 15:38:02.742370 IP CLIENT.980 > SERVER.nfsd: . ack 6213 win 63458 <nop,nop,timestamp 558517
> 797293252>
> 15:38:02.742639 IP SERVER.nfs > CLIENT.779122034: reply ERR 1448
> 15:38:02.742724 IP SERVER.nfs > CLIENT.4020: reply ok 952
> 15:38:02.742787 IP CLIENT.980 > SERVER.nfsd: . ack 8613 win 65535 <nop,nop,timestamp 558517
> 797293252>
> 15:38:02.743621 IP CLIENT.2121656496 > SERVER.nfs: 148 readdir [|nfs]
> 15:38:02.744928 IP SERVER.nfs > CLIENT.2121656496: reply ok 1448 readdir [|nfs]
> 15:38:02.745043 IP SERVER.nfs > CLIENT.777151841: reply ERR 1448
> 15:38:02.745103 IP CLIENT.980 > SERVER.nfsd: . ack 11509 win 64906 <nop,nop,timestamp 558520
> 797293253>
>   ...snip...
> 
> 
> 
> [netstat at NFS server]
> 
> 1. first time
> 
>    SERVER.nfsd      CLIENT.706          131070      0 10136      0 ESTABLISHED
> 
> 2. closed
> 
>    SERVER.nfsd      CLIENT.706          131070      0 10136      0 FIN_WAIT_2
> 
> 3. after reconnect
> 
>    SERVER.nfsd      CLIENT.706          131070      0 10136      0 TIME_WAIT
>    SERVER.nfsd      CLIENT.980          131070      0 10136      0 ESTABLISHED
> 
> 
> >> On Mon, 7 Nov 2005 21:41:22 -0800 (PST), Mohan Srinivasan <mohan_srinivasan_at_yahoo.com> said:
> 
> M> Hi,
> M> Can you try the change attached below and tell me if it 
> M> fixes your problem ? Sorry, I haven't tested this change
> M> (I don't have a Solaris server handy to test against and
> M> would have to hack the FreeBSD server to close down the 
> M> connection after a period of inactivity, like Solaris seems
> M> to do).
> 
> M> The change should cause the client to detect that the 
> M> server has FIN'ed, and should force a reconnect. Let me 
> M> know if this works for you and I will commit.
> 
> M> thanks
> 
> M> mohan
> 
> M> ==== //depot/yahoo/ybsd_6/src/sys/nfsclient/nfs_socket.c#10 - /homes/mohans/p4/yahoo/ybsd
> M> _6/src/sys/nfsclient/nfs_socket.c ====
> M> _at__at_ -521,6 +521,17 _at__at_
> M>         return (error);
> M>  }
>  
> M> +static inline int
> M> +nfs_cantrecvmore(struct socket *so)
> M> +{
> M> +       int ret;
> M> +       
> M> +       SOCKBUF_LOCK(&so->so_rcv);
> M> +       ret = (so->so_rcv.sb_state & SBS_CANTRCVMORE);
> M> +       SOCKBUF_UNLOCK(&so->so_rcv);
> M> +       return ret;
> M> +}
> M> +
> M>  int
> M>  nfs_reply(struct nfsreq *rep)
> M>  {
> M> _at__at_ -550,7 +561,7 _at__at_
> M>                 }
> M>                 so = rep->r_nmp->nm_so;
> M>                 mtx_lock(&rep->r_nmp->nm_nfstcpstate.mtx);
> M> -               if (!so || 
> M> +               if (!so || nfs_cantrecvmore(so) ||
> M>                     (rep->r_nmp->nm_nfstcpstate.flags & NFS_TCP_FORCE_RECONNECT)) {
> M>                         mtx_unlock(&rep->r_nmp->nm_nfstcpstate.mtx);
> M>                         error = nfs_reconnect(rep);
> 
> 
> M> --- TOMITA Yoshinori <yoshint_at_flab.fujitsu.co.jp> wrote:
> 
> >> I saw a NFS client of FreeBSD 6.0-RC1 gets NFS stuck using TCP NFS and
> >> amd -w LARGE_NUMBER, where -w specifies dismount interval such as
> >> 3600.
> >> 
> >> 6.0-STABLE cvsuped 2005-11-7 has same problem.
> >> It has always been OK on FreeBSD 4.X to 5.4-STABLE.
> >> 
> >> FreeBSD 6.0 NFS client seems keep trying NFS access forever after NFS
> >> TCP connection is (half?) closed.
> >> 
> >> I am not sure amd and -w argument are the key of this problem, but they
> >> raise probability from my experience.
> >> 
> >> 
> >> 
> >> NFS Server:
> >> Solaris 2.6
> >> 
> >> NFS Client:
> >> FreeBSD 6.0-RC1 (cvsuped 2005-11-02)
> >> 
> >> Starting amd at FreeBSD:
> >> /usr/sbin/amd -p -w 3600 -a /.amd_mnt -l syslog /host /etc/amd.map /net /etc/amd.map
> >> 
> >> /etc/amd.map:
> >> # changed proto=udp to proto=tcp
> >> ------
> >> /defaults       type:=host;fs:=${autodir}/${rhost}/host;rhost:=${key}
> >> *               opts:=rw,grpid,resvport,vers=3,proto=tcp,nosuid,nodev
> >> ------
> >> 
> >> 
> >> 
> >> tcpdump log:
> >> --------------------------------------------------
> >> 
> >> [several minutes passed after NFS auto-mounted]
> >> 
> >> 11:09:38.852928 IP CLIENT.3254779904 > SERVER.nfs: 40 null
> >> 11:09:38.853261 IP SERVER.nfs > CLIENT.3254779904: reply ok 24 null
> >> 11:10:08.851290 IP CLIENT.3523215360 > SERVER.nfs: 40 null
> >> 11:10:08.851529 IP SERVER.nfs > CLIENT.3523215360: reply ok 24 null
> >> 
> >> [FIN from NFS server, FreeBSD NFS client sends ACK]
> >> 
> >> 11:10:08.854469 IP SERVER.nfsd > CLIENT.820: F 341:341(0) ack 200 win 10136
> <nop,nop,timestamp
> >> 787046144 183809>
> >> 11:10:08.854555 IP CLIENT.820 > SERVER.nfsd: . ack 342 win 65535 <nop,nop,timestamp 543746
> 787046144> 
> >> 11:10:08.864321 IP SERVER.nfsd > CLIENT.679: F 341:341(0) ack 200 win 10136
> <nop,nop,timestamp
> >> 787046145 183817>
> >> 11:10:08.864383 IP CLIENT.679 > SERVER.nfsd: . ack 342 win 65535 <nop,nop,timestamp 543756
> 787046145> 
> >> 11:10:08.895133 IP SERVER.nfsd > CLIENT.735: F 40141:40141(0) ack 1900 win 10136
> >> <nop,nop,timestamp 787046148 183839>
> >> 11:10:08.895203 IP CLIENT.735 > SERVER.nfsd: . ack 40142 win 65535 <nop,nop,timestamp 543787
> 787046148> 
> >> 
> >> [still NFS NULL continues from FreeBSD NFS client]
> >> 
> >> 11:10:38.849614 IP CLIENT.3791650816 > SERVER.nfs: 40 null
> >> 11:10:38.849906 IP SERVER.nfs > CLIENT.3791650816: reply ok 24 null
> >> 11:11:08.847947 IP CLIENT.4060086272 > SERVER.nfs: 40 null
> >> 11:11:08.848289 IP SERVER.nfs > CLIENT.4060086272: reply ok 24 null
> >> 11:11:38.846275 IP CLIENT.33619968 > SERVER.nfs: 40 null
> >> 11:11:38.846555 IP SERVER.nfs > CLIENT.33619968: reply ok 24 null
> >> 
> >> [do some NFS access, such as /bin/ls /host/SERVER/SOMEWHERE on FreeBSD]
> >> 
> >> 11:12:08.413414 IP CLIENT.302055424 > SERVER.nfs: 40 null
> >> 11:12:08.413797 IP SERVER.nfs > CLIENT.302055424: reply ok 24 null
> >> 11:12:08.414096 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs]
> >> 11:12:08.507632 IP SERVER.nfsd > CLIENT.735: . ack 2032 win 10136 <nop,nop,timestamp
> 787058109
> 663316> 
> >> 11:12:10.083443 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs]
> >> 11:12:10.177666 IP SERVER.nfsd > CLIENT.735: . ack 2164 win 10136 <nop,nop,timestamp
> 787058276
> 664986> 
> >> 11:12:11.743299 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs]
> >> 11:12:11.837696 IP SERVER.nfsd > CLIENT.735: . ack 2296 win 10136 <nop,nop,timestamp
> 787058442
> 666646> 
> >> 11:12:15.063018 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs]
> >> 11:12:15.157790 IP SERVER.nfsd > CLIENT.735: . ack 2428 win 10136 <nop,nop,timestamp
> 787058774
> 669966> 
> >> 11:12:21.702417 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs]
> >> 11:12:21.798028 IP SERVER.nfsd > CLIENT.735: . ack 2560 win 10136 <nop,nop,timestamp
> 787059438
> 676606> 
> >> 11:12:34.981251 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs]
> >> 11:12:35.078429 IP SERVER.nfsd > CLIENT.735: . ack 2692 win 10136 <nop,nop,timestamp
> 787060766
> 689886> 
> >> 11:12:38.411967 IP CLIENT.570490880 > SERVER.nfs: 40 null
> >> 11:12:38.412267 IP SERVER.nfs > CLIENT.570490880: reply ok 24 null
> >> 11:13:01.538891 IP CLIENT.1563723127 > SERVER.nfs: 132 access [|nfs]
> >> 
> >> [----> message repeats on FreeBSD NFS client]
> >> nfs server SERVER:/export/home: not responding
> >> 
> >> --------------------------------------------------
> >> 
> >> [netstat _at_ Solaris NFS server, when NFS is OK]
> >> 
> >> SERVER.nfsd        CLIENT.820          131070      0 10136      0 ESTABLISHED
> >> SERVER.nfsd        CLIENT.735          131070      0 10136      0 ESTABLISHED
> >> SERVER.nfsd        CLIENT.679          131070      0 10136      0 ESTABLISHED
> >> 
> >> [after NFS stuck]
> >> 
> >> SERVER.nfsd        CLIENT.820          131070      0 10136      0 FIN_WAIT_2
> >> SERVER.nfsd        CLIENT.735          131070      0 10136      0 FIN_WAIT_2
> >> SERVER.nfsd        CLIENT.679          131070      0 10136      0 FIN_WAIT_2
> >> 
> >> [netstat _at_ FreeBSD NFS client, after NFS stuck]
> >> 
> >> tcp4       0      0  CLIENT.679            SERVER.nfsd  CLOSE_WAIT
> >> tcp4       0      0  CLIENT.735            SERVER.nfsd  CLOSE_WAIT
> >> tcp4       0      0  CLIENT.820            SERVER.nfsd  CLOSE_WAIT
> >> 
> >> 
> >> -- 
> >> ---
> >> TOMITA Yoshinori
> >> (Fujitsu Laboratories Ltd., Kawasaki, Japan)
> >> _______________________________________________
> >> freebsd-current_at_freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> >> 
> 
> 
> -- 
> ---
> TOMITA Yoshinori
> (Fujitsu Laboratories Ltd., Kawasaki, Japan)
> 
Received on Tue Nov 08 2005 - 06:40:13 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:47 UTC