Re: NFS regression.

From: Pawel Jakub Dawidek <pjd_at_FreeBSD.org>
Date: Tue, 18 Nov 2008 19:18:24 +0100
On Tue, Nov 18, 2008 at 09:13:26AM +0000, Doug Rabson wrote:
> 
> On 17 Nov 2008, at 18:37, Pawel Jakub Dawidek wrote:
> 
> >On Mon, Nov 17, 2008 at 06:07:52PM +0000, Doug Rabson wrote:
> >>
> >>On 17 Nov 2008, at 18:02, Pawel Jakub Dawidek wrote:
> >>
> >>>On Mon, Nov 17, 2008 at 05:54:02PM +0000, Doug Rabson wrote:
> >>>>
> >>>>On 17 Nov 2008, at 17:10, Pawel Jakub Dawidek wrote:
> >>>>
> >>>>>Hi.
> >>>>>
> >>>>>I'm seeing this panic very often now with few days old HEAD:
> >>>>>
> >>>>>
> >>>>>Any ideas?
> >>>>
> >>>>Can you reproduce this with INVARIANTS turned on? That should  
> >>>>trigger
> >>>>a KASSERT a bit earlier and give me a chance to fix the thing.
> >>>
> >>>I've INVARIANTS on... Is there some assertion added recently you are
> >>>expecting?
> >>
> >>Hmm. I added an assert in r184921 which ought to have caught this.
> >>Could you try this patch and see if it changes anything:
> >>
> >>Index: rpc/clnt_dg.c
> >>===================================================================
> >>--- rpc/clnt_dg.c	(revision 184968)
> >>+++ rpc/clnt_dg.c	(working copy)
> >>_at__at_ -543,7 +543,7 _at__at_
> >>
> >>		if (tv > 0) {
> >>			if (cu->cu_closing || cu->cu_closed)
> >>-				error = 0;
> >>+				error = ESHUTDOWN;
> >>			else
> >>				error = msleep(cr, &cs->cs_lock,
> >>				    cu->cu_waitflag, cu->cu_waitchan, tv);
> >>
> >
> >Ok, my source is older and doesn't contain the assertion you added. I
> >applied the patch above and also added assertion by hand (I'm not  
> >setup
> >now to upgrade entire system). This is the panic I get with the new
> >kernel:
> >
> >...
> >
> >If you want me to convert some of those to file:line, just let me  
> >know.
> 
> Don't worry about line numbers - I can see where its calling from. Do  
> you have a recipe for reproducing this? Also, could you try this patch  
> instead of the previous:
> 
> Index: rpc/clnt_dg.c
> ===================================================================
> --- rpc/clnt_dg.c	(revision 184968)
> +++ rpc/clnt_dg.c	(working copy)
[...]

With this patch it still panics here:

panic: xdrmbuf_create with NULL mbuf chain
cpuid = 0
KDB: enter: panic
[thread pid 8305 tid 100055 ]
Stopped at      kdb_enter+0x3a: movl    $0,kdb_why
db> tr
Tracing pid 8305 tid 100055 td 0x840f3b40
kdb_enter(80686620,80686620,806a1861,83ac78b4,0,...) at kdb_enter+0x3a
panic(806a1861,83ac7988,805c6746,83ac7954,0,...) at panic+0x136
xdrmbuf_create(83ac7954,0,1,2a3,bb9,...) at xdrmbuf_create+0x1f
clnt_dg_call(83f9b5c0,83ac7a1c,e,84111900,83ac7a58,...) at clnt_dg_call+0xca6
clnt_reconnect_call(83f9b540,83ac7a1c,e,84111900,83ac7a58,...) at clnt_reconnect_call+0x5a0
nfs_request(84218d9c,84111900,e,840f3b40,841fbe00,...) at nfs_request+0x1dd
nfs_renamerpc(84218d9c,83e23610,15,841fbe00,840f3b40,...) at nfs_renamerpc+0x1ab
nfs_sillyrename(84c0a430,8,0,0,84218d9c,...) at nfs_sillyrename+0x10a
nfs_remove(83ac7c30,83ac7c30,0,83ac7c30,84c0a430,...) at nfs_remove+0x12f
VOP_REMOVE_APV(806cfea0,83ac7c30,2,841c429c,7fbfdd34,...) at VOP_REMOVE_APV+0xa5
kern_unlinkat(840f3b40,ffffff9c,7fbfdd34,0,83ac7c80,...) at kern_unlinkat+0x187
kern_unlink(840f3b40,7fbfdd34,0,83ac7d2c,8065a4c3,...) at kern_unlink+0x27
unlink(840f3b40,83ac7cf8,4,840f3b40,806bab90,...) at unlink+0x22
syscall(83ac7d38) at syscall+0x283
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (10, FreeBSD ELF32, unlink), eip = 0x807d5d3, esp = 0x7fbfdc7c, ebp = 0x7fbfdcf8 ---

I can reproduce it easly. I've a netbooted system where I start
'make -ssj4 buildworld', but both src/ and obj/ directories are on local
ZFS file system. So only all the system tools and libraries are on NFS.
I'm using UDP for NFS, BTW. Sorry for not mentioning it earlier:

/boot/loader.conf:

boot.nfsroot.options="nolockd,udp"

/etc/fstab:

# Device                Mountpoint      FStype  Options                                 Dump    Pass#
192.168.5.1:/zoo/camel  /               nfs     rw,noatime,nolockd,mntudp,intr,-3       0       0
192.168.5.1:/zoo/pjd    /zoo/pjd        nfs     rw,noatime,nolockd,mntudp,intr,-3       0       0

If you won't be able to reproduce that, I can give you access to this
machine, it sits in the netperf cluster.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd_at_FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

Received on Tue Nov 18 2008 - 17:18:53 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:37 UTC