On Mon, 30 Nov 2009, Robert N. M. Watson wrote: > > On 30 Nov 2009, at 05:36, Eirik �verby wrote: > >> Short follow-up: Making OpenBSD use TCP mounts (it defaults to UDP) seems to solve the issue. >> >> So this is a UDP-NFS-related problem, it would seem? > > Could well be. Let's try another debugging tactic -- there are two possible things going on here: resource leak, and resource exhaustion leading to deadlock. If you shut down to single user mode from multi-user, and let the system quiesce for a few minutes, then run netstat -m, what does it look like? Do vast numbers of mbufs+clusters get freed, or do they remain accounted for as allocated? > > (If they remain allocated, they were likely leaked, since most/all sockets will have been closed, releasing their resources on shutdown to single user when all processes are killed) > > The theory of an mbuf leak in NFS isn't an unlikely theory -- the socket code there continues to change, and rare edge cases frequently lead to leaks (per my earlier e-mail). Perhaps there's a case the OpenBSD client is triggering that other NFS clients normally don't. If we think that's the case, the next step is usually to narrow down what causes the leak to trigger a lot (i.e., the backup starting), and then grab a packet trace that we can analyze with wireshark. We'll want to look at the types of errors being returned for RPCs and, in particular, if there's one that happens about the same number of times as the resource has leaked over the same window, look at the code and see if that error case is handled properly. > > If this is definitely an NFS leak bug, we should get the NFS folks attention by sticking "NFS mbuf leak" in the subject line and CC'ing rmacklem/dfr. :-) > It's a bit of a shot in the dark, but could you please test the following patch? It patches for a possible mbuf leak + a possible M_SONAME leak (I have no idea if these ever occur in practice?). It also fixes a case where the return value for svc_reply_dg() would have been TRUE for failure. It was all I could see from a quick look. rick --- rpc/svc_dg.c.sav 2009-12-07 15:37:45.000000000 -0500 +++ rpc/svc_dg.c 2009-12-07 15:48:50.000000000 -0500 _at__at_ -221,6 +221,8 _at__at_ xdrmbuf_create(&xdrs, mreq, XDR_DECODE); if (! xdr_callmsg(&xdrs, msg)) { XDR_DESTROY(&xdrs); + if (raddr != NULL) + free(raddr, M_SONAME); return (FALSE); } _at__at_ -259,11 +261,13 _at__at_ m_fixhdr(mrep); error = sosend(xprt->xp_socket, addr, NULL, mrep, NULL, 0, curthread); - if (!error) { - stat = TRUE; + if (error) { + stat = FALSE; } } else { m_freem(mrep); + if (m != NULL) + m_freem(m); } XDR_DESTROY(&xdrs);Received on Mon Dec 07 2009 - 20:20:12 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC