Re: Processes stuck in nfsreq

From: Mohan Srinivasan <mohan_srinivasan_at_yahoo.com>
Date: Thu, 3 Feb 2005 10:29:16 -0800 (PST)
Hi,

In the info you have below, the NFS request was not sent (R_SENT
is not set). We didn't send out the request because possibly nm_cwnd 
didn't allow us to send the request. If nfs_request() could not send 
the request, then typically, the request would be retransmitted from 
nfs_timer(). 

If nm_cwnd is full, then there must be a lot of outstanding requests 
that haven't been replied to yet (on this mount), which is the fundamental
issue.

Can you force a core dump and send me a pointer to it ?

Also, after you force a core, can you also try a quick workaround - someone
else also reported NFS client hangs and said that things were fine
after they set mpsafenet to 0. It would be good to see if there's a
correlation there.

mohan

--- Kris Kennaway <kris_at_obsecurity.org> wrote:

> On Sat, Dec 25, 2004 at 09:35:24AM -0800, Mohan Srinivasan wrote:
> > Hi,
> > 
> > If the box is still in that state, can you force a core ?
> > 
> > The trace below shows the process blocked on nfs_request()->msleep.
> > If the process shows up as waiting on "nfsreq", it would be msleep'ing
> > from nfs_reply(), not from nfs_request(). There's only one msleep()
> > call in nfs_request() (nfs_socket.c:1125), but that is for "nqnfstry".
> > The msleep("nfsreq") only happens from nfs_reply() (unless nfs_reply() 
> > has gotten inlined).
> 
> OK, I now have working online kgdb.  Here's the trace from one stuck
> cvs process:
> 
> #0  sched_switch (td=0xc3add2e0, newtd=0xc81c2a10, flags=1) at
> /usr/src/sys/kern/sched_4bsd.c:964
> #1  0xc0517246 in mi_switch (flags=1, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:356
> #2  0xc0534624 in sleepq_switch (wchan=0x0) at /usr/src/sys/kern/subr_sleepqueue.c:431
> #3  0xc05348f2 in sleepq_wait_sig (wchan=0x0) at /usr/src/sys/kern/subr_sleepqueue.c:556
> #4  0xc0516db6 in msleep (ident=0xc57ff500, mtx=0xc07a1040, priority=339, wmesg=0xc0700e35
> "nfsreq",
>     timo=0) at /usr/src/sys/kern/kern_synch.c:225
> #5  0xc05f186a in nfs_reply (rep=0xc57ff500) at /usr/src/sys/nfsclient/nfs_socket.c:600
> #6  0xc05f2583 in nfs_request (vp=0xc8b3eac8, mrest=0xc3917200, procnum=4, td=0xc3add2e0,
>     cred=0xc512c380, mrp=0xee1a397c, mdp=0xee1a3980, dposp=0xee1a3984)
>     at /usr/src/sys/nfsclient/nfs_socket.c:1043
> #7  0xc05f7681 in nfs3_access_otw (vp=0xc8b3eac8, wmode=-300271216, td=0xc3add2e0,
> cred=0xc512c380)
>     at /usr/src/sys/nfsclient/nfs_vnops.c:264
> #8  0xc05f7f34 in nfs_getattr (ap=0xee1a3a88) at /usr/src/sys/nfsclient/nfs_vnops.c:592
> #9  0xc06ce1a0 in VOP_GETATTR_AP (a=0x0) at vnode_if.c:484
> #10 0xc05f89dc in nfs_lookup (ap=0x0) at vnode_if.h:275
> #11 0xc06cdca0 in VOP_LOOKUP_AP (a=0x0) at vnode_if.c:89
> #12 0xc0570489 in lookup (ndp=0xee1a3c28) at vnode_if.h:54
> #13 0xc056fdb8 in namei (ndp=0xee1a3c28) at /usr/src/sys/kern/vfs_lookup.c:189
> #14 0xc057e102 in stat (td=0xc3add2e0, uap=0xee1a3d14) at /usr/src/sys/kern/vfs_syscalls.c:2088
> #15 0xc06b32d0 in syscall (frame=
>       {tf_fs = 47, tf_es = 674955311, tf_ds = -1078001617, tf_edi = 3, tf_esi = 3, tf_ebp =
> -1077943176, tf_isp = -300270220, tf_ebx = 674917772, tf_edx = 135074816, tf_ecx = 2, tf_eax =
> 188, tf_trapno = 12, tf_err = 2, tf_eip = 674368127, tf_cs = 31, tf_eflags = 514, tf_esp =
> -1077943860, tf_ss = 47})
>     at /usr/src/sys/i386/i386/trap.c:951
> #16 0xc069dfcf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
> #17 0x0000002f in ?? ()
> (kgdb) frame 6
> #6  0xc05f2583 in nfs_request (vp=0xc8b3eac8, mrest=0xc3917200, procnum=4, td=0xc3add2e0,
>     cred=0xc512c380, mrp=0xee1a397c, mdp=0xee1a3980, dposp=0xee1a3984)
>     at /usr/src/sys/nfsclient/nfs_socket.c:1043
> 1043                    error = nfs_reply(rep);
> (kgdb) print rep
> $3 = (struct nfsreq *) 0xc57ff500
> (kgdb) print *rep
> $4 = {r_chain = {tqe_next = 0xc7b23200, tqe_prev = 0xc07a0ff0}, r_mreq = 0xc37d8700, r_mrep =
> 0x0,
>   r_md = 0x0, r_dpos = 0xdeadc0de "", r_nmp = 0xc39d0000, r_vp = 0xc8b3eac8, r_xid = 2829996295,
>   r_flags = 1, r_retry = 10, r_rexmit = 0, r_timer = -559038242, r_procnum = 4, r_rtt = -1,
>   r_lastmsg = 4245, r_td = 0xc3add2e0}
> (kgdb)
> 
> The deadc0de is interesting.  Let me know if there's more debugging
> info I can get for you.
> 
> Kris
> 

> ATTACHMENT part 2 application/pgp-signature 
Received on Thu Feb 03 2005 - 17:29:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:27 UTC