Deadlock in nullfs/zfs somewhere

From: Adrian Chadd <adrian_at_freebsd.org>
Date: Tue, 9 Jul 2013 06:03:58 -0700
Hi all,

I'm doing some -10 i386/amd64 package builds on a 32-core build server running:

FreeBSD vm0.freebsd.org 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r252897:
Sat Jul  6 23:16:03 UTC 2013
sbruno_at_vm0.freebsd.org:/usr/obj/usr/src/sys/VM0  amd64

And I hit a deadlock:

Unread portion of the kernel message buffer:
panic: deadlkres: possible deadlock detected for 0xfffffe00adc2a920,
blocked for 1800101 ticks

(kgdb) tid 100874
[Switching to thread 799 (Thread 100874)]#0  sched_switch
(td=0xfffffe00adc2a920, newtd=<value optimized out>, flags=<value
optimized out>)
    at /usr/src/sys/kern/sched_ule.c:1954
1954                    cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xfffffe00adc2a920, newtd=<value optimized out>,
flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1954
#1  0xffffffff804e70ee in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:487
#2  0xffffffff8052150a in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:620
#3  0xffffffff804c2abc in sleeplk (lk=<value optimized out>,
flags=524544, ilk=<value optimized out>, wmesg=0xffffffff80f1b89a
"zfs", pri=<value optimized out>,
    timo=<value optimized out>) at /usr/src/sys/kern/kern_lock.c:226
#4  0xffffffff804c22f5 in __lockmgr_args (lk=0xfffffe00ad56a068,
flags=<value optimized out>, ilk=0xfffffe00ad56a098,
wmesg=0xffffffff80f1b89a "zfs", pri=96, timo=51,
    line=<value optimized out>) at /usr/src/sys/kern/kern_lock.c:919
#5  0xffffffff8056a26c in vop_stdlock (ap=<value optimized out>) at lockmgr.h:97
#6  0xffffffff80790ded in VOP_LOCK1_APV (vop=<value optimized out>,
a=<value optimized out>) at vnode_if.c:2084
#7  0xffffffff805891a3 in _vn_lock (vp=0xfffffe00ad56a000,
flags=<value optimized out>, file=0xffffffff807fb89e
"/usr/src/sys/kern/vfs_subr.c", line=2099)
    at vnode_if.h:859
#8  0xffffffff805791aa in vget (vp=0xfffffe00ad56a000, flags=524544,
td=0xfffffe00adc2a920) at /usr/src/sys/kern/vfs_subr.c:2099
#9  0xffffffff805664b2 in cache_lookup (dvp=0xfffffe00ad4e1588,
vpp=0xffffff9049b29188, cnp=0xffffff9049b295a0, tsp=0x0, ticksp=0x0)
at /usr/src/sys/kern/vfs_cache.c:674
#10 0xffffffff80567651 in vfs_cache_lookup (ap=<value optimized out>)
at /usr/src/sys/kern/vfs_cache.c:1033
#11 0xffffffff8078efa2 in VOP_LOOKUP_APV (vop=<value optimized out>,
a=<value optimized out>) at vnode_if.c:129
#12 0xffffffff8126714b in null_lookup (ap=0xffffff9049b29248) at vnode_if.h:54
#13 0xffffffff8078efa2 in VOP_LOOKUP_APV (vop=<value optimized out>,
a=<value optimized out>) at vnode_if.c:129
#14 0xffffffff8056f6eb in lookup (ndp=0xffffff9049b29520) at vnode_if.h:54
#15 0xffffffff8056ee84 in namei (ndp=0xffffff9049b29520) at
/usr/src/sys/kern/vfs_lookup.c:292
#16 0xffffffff80588952 in vn_open_cred (ndp=0xffffff9049b29520,
flagp=0xffffff9049b296a0, cmode=0, vn_open_flags=<value optimized
out>, cred=0xfffffe071c32a900, fp=0x0)
    at /usr/src/sys/kern/vfs_vnops.c:202
#17 0xffffffff8056a774 in vop_stdvptocnp (ap=<value optimized out>) at
/usr/src/sys/kern/vfs_default.c:797
#18 0xffffffff81267a1b in null_vptocnp (ap=0xffffff9049b29878) at
/usr/src/sys/modules/nullfs/../../fs/nullfs/null_vnops.c:824
#19 0xffffffff80792628 in VOP_VPTOCNP_APV (vop=<value optimized out>,
a=<value optimized out>) at vnode_if.c:3649
#20 0xffffffff80567ee3 in vn_vptocnp_locked (vp=0xffffff9049b29900,
cred=0xfffffe071c32a900, buf=0xfffffe00ad708800 "",
buflen=0xffffff9049b298fc) at vnode_if.h:1564
#21 0xffffffff80567a02 in vn_fullpath1 (td=0xfffffe00adc2a920,
vp=0xfffffe03ec1d5ce8, rdir=0xfffffe071b898760, buf=0xfffffe00ad708800
"", retbuf=0xffffff9049b29960,
    buflen=1004) at /usr/src/sys/kern/vfs_cache.c:1325
#22 0xffffffff805677b5 in kern___getcwd (td=0xfffffe00adc2a920,
buf=0x80dd3d4 <Address 0x80dd3d4 out of bounds>, bufseg=UIO_USERSPACE,
buflen=Cannot access memory at address 0x400
)
    at /usr/src/sys/kern/vfs_cache.c:1089
#23 0xffffffff8076554c in ia32_syscall (frame=0xffffff9049b29ac0) at
subr_syscall.c:134
#24 0xffffffff807227a5 in Xint0x80_syscall () at ia32_exception.S:73
#25 0x0000000008072c33 in ?? ()
Previous frame inner to this frame (corrupt stack?)

.. and it's here:

(kgdb) sleepchain 100874
 thread 100874 (pid 75371, make) blocked on lk "zfs" SHARED (count 2)

Now, this system doesn't have witness (yet!), so a bunch more hoops
need to be jumped through to figure out what else is blocking on that
particular lock.

Does anyone have any ideas as to what's going on? Or has it been fixed
over the last couple days and I haven't noticed?

Thanks!


-adrian
Received on Tue Jul 09 2013 - 11:03:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:39 UTC