I think the following patch fixes the problem reported by O. Seibert w.r.t. NFS over TCP taking 5min to reconnect to a server after a period of inactivity. (I think there have been others bit by this, but they were vague reports of trouble with NFS over TCP.) I didn't see the problem, because I was mainly testing against a FreeBSD server and/or using NFSv4 (NFSv4 does a Renew every 30sec, so the TCP connection isn't inactive for long enough for a Solaris server to disconnect it.) clnt_vc_call() in sys/rpc/clnt_vc.c checks for the server closing down the connection while the RPC is in progress, but doesn't check to see if it has already happened. If it has already happened, there would be no upcall to prompt a wakeup of the msleep() waiting for a reply, etc. This patch adds a check for the connection being closed by the server, just before queuing the request and sending it. (I think this fixes the problem.) What I really need is some people to test NFS over TCP with the patch applied to their kernel. It doesn't matter if you aren't seeing the problem (ie. using a FreeBSD server), since I am more concerned with the patch breaking something else than fixing the problem. (This seems serious enough that I'd like to try and get a fix into 8.0, which is why I'm hoping some folks can test this quickly?) Thanks in advance for help with this, rick --- patch for sys/rpc/clnt_vc.c --- --- rpc/clnt_vc.c.sav 2009-10-28 15:44:20.000000000 -0400 +++ rpc/clnt_vc.c 2009-10-29 15:40:37.000000000 -0400 _at__at_ -413,6 +413,22 _at__at_ cr->cr_xid = xid; mtx_lock(&ct->ct_lock); + /* + * Check to see if the other end has already started to close down + * the connection. The upcall will have set ct_error.re_status + * to RPC_CANTRECV if this is the case. + * If the other end starts to close down the connection after this + * point, it will be detected later when cr_error is checked, + * since the request is in the ct_pending queue. + */ + if (ct->ct_error.re_status == RPC_CANTRECV) { + if (errp != &ct->ct_error) { + errp->re_errno = ct->ct_error.re_errno; + errp->re_status = RPC_CANTRECV; + } + stat = RPC_CANTRECV; + goto out; + } TAILQ_INSERT_TAIL(&ct->ct_pending, cr, cr_link); mtx_unlock(&ct->ct_lock);Received on Thu Oct 29 2009 - 19:03:46 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:57 UTC