On 04/03/2008, Kostik Belousov <kostikbel_at_gmail.com> wrote: > On Mon, Mar 03, 2008 at 11:16:23PM +0300, pluknet wrote: > > On 03/03/2008, Kostik Belousov <kostikbel_at_gmail.com> wrote: > > > On Mon, Mar 03, 2008 at 09:27:15PM +0300, pluknet wrote: > > > > On 03/03/2008, Kostik Belousov <kostikbel_at_gmail.com> wrote: > > > > [snip] > > > > > To summarize, I need both the tcpdump and kernel/witness messages from > > > > > the panic. > > > > > > > > > > > > > I'm sorry. Here it is. > > > > http://pluknet.nm.ru/dev/tcpdump-nfsserver-full.raw > > > > > > > > The messages (same as unread msgbuf in initial posting, hand-scribed): > > > > panic: mutex Giant owned at > > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_syscalls.c:556 > > > > KDB: enter: panic > > > > [thread pid 601 tid 100055 ] > > > > Stopped at kdb_enter+0x3a: movl $0,kdb_why > > > > db> show locks > > > > exclusive sleep mutex nfsd_mtx r = 0 (0xc2e0af40) locked _at_ > > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_syscalls.c:501 > > > > exclusive sleep mutex Giant r = 0 (0xc07e6410) locked _at_ > > > > /usr/src/sys/kern/vfs_lookup.c:663 > > > > > > > > > Nevertheless, the patch below might help with the panic during > > > > > the unlinking (not tested). > > > > > > > > > > diff --git a/sys/nfsserver/nfs_serv.c b/sys/nfsserver/nfs_serv.c > > > > > index 446651d..87e1aaa 100644 > > > > > --- a/sys/nfsserver/nfs_serv.c > > > > > +++ b/sys/nfsserver/nfs_serv.c > > > > > _at__at_ -2146,7 +2146,7 _at__at_ nfsrv_remove(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, > > > > > nfsfh_t nfh; > > > > > fhandle_t *fhp; > > > > > struct mount *mp = NULL; > > > > > - int vfslocked; > > > > > + int vfslocked, vfslocked1; > > > > > > > > > > nfsdbprintf(("%s %d\n", __FILE__, __LINE__)); > > > > > ndclear(&nd); > > > > > _at__at_ -2168,7 +2168,11 _at__at_ nfsrv_remove(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp, > > > > > nd.ni_cnd.cn_flags = LOCKPARENT | LOCKLEAF | MPSAFE; > > > > > error = nfs_namei(&nd, fhp, len, slp, nam, &md, &dpos, > > > > > &dirp, v3, &dirfor, &dirfor_ret, td, FALSE); > > > > > - vfslocked = NDHASGIANT(&nd); > > > > > + vfslocked1 = NDHASGIANT(&nd); > > > > > + if (vfslocked && vfslocked1) > > > > > + VFS_UNLOCK_GIANT(vfslocked1); > > > > > + if (vfslocked || vfslocked1) > > > > > + vfslocked = 1; > > > > > if (dirp && !v3) { > > > > > vrele(dirp); > > > > > dirp = NULL; > > > > > > > > > > > > > > > > > > Now the last lock triplex looks like: > > > > vfslocked lock in > > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_serv.c, 2161 > > > > vfslocked lock in > > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_srvsubs.c, 1106 > > > > vfslocked lock in > > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_srvsubs.c, 673 > > > > vfslocked unlock in > > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_srvsubs.c, 916 > > > > vfslocked1 unlock in > > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_serv.c, 2173 > > > > ^^^ > > > > vfslocked unlock in > > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_serv.c, 2238 > > > > > > > > And no panic. Thanks. > > > > > > > > > Could you, please, clarify. As I read you mail, the patch fixed at least > > > one of your panic. Are there any other situations where nfs server over > > > non-MPSAFE fs panics for you ? It is possible that what you reported > > > before actually contains several different reasons for Giant leak. > > > > Of course. > > That another situation is while performing /etc/rc.d/nfsd stop > > > System call nfssvc returning with the following locks held: > > > exclusive sleep mutex Giant r = 2 (0xc07e6410) locked _at_ > > > /usr/src/sys/modules/nfsserver/../../nfsserver/nfs_srvsubs.c:1106 > > > panic: witness_warn > > > > I got no panic with this patch: > > I lost you again. What patch you are referencing there ? Is that the > patch I sent you, or some _other_ patch ? I am referencing to the patch you sent me. > > > > # /etc/rc.d/nfsd stop > > Stopping nfsd. > > kill: 1737: No such process > > kill: 1738: No such process > > kill: 1739: No such process > > kill: 1740: No such process > > # > > > > wbr, > > pluknet > >Received on Tue Mar 04 2008 - 09:13:04 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:28 UTC