Re: Reproducible ZFS panic, w/ script (Was: "New" ZFS crash on FS (pool?) unmount/export)

From: Thomas Backman <serenity_at_exscape.org>
Date: Sat, 11 Jul 2009 16:08:26 +0200
On Jul 10, 2009, at 21:27, Kip Macy wrote:

> "zfs export" does a forced unmount. We may not be properly handling
> dangling references.
>
> -Kip

A bit more digging:

[root_at_chaos ~]# bash zfs_crash.sh initial
[root_at_chaos ~]# bash zfs_crash.sh stress ## with the unmount part  
(line 107) **commented out**
I then let the above run for say 20 seconds to create a bunch of  
snapshots (ignoring errors; in my own script I added a random number  
to the snapshot name to avoid collisions), and then:

[root_at_chaos ~]# zpool export crashtestmaster
[root_at_chaos ~]# zfs list
NAME                             USED  AVAIL  REFER  MOUNTPOINT
crashtestslave                  20.3M  40.7M    20K  /crashtestslave/ 
crashtestslave
crashtestslave/test_cloned      19.8M  40.7M  19.8M  /crashtestslave/ 
crashtestslave/test_cloned
crashtestslave/test_orig            0  40.7M  19.8M  /crashtestslave/ 
crashtestslave/test_orig
tank                            5.67G  59.3G    18K  none
tank/root                        616M  59.3G   224M  /
tank/...
[root_at_chaos ~]# zfs unmount crashtestslave/test_orig
[root_at_chaos ~]# zfs unmount crashtestslave/test_cloned
[root_at_chaos ~]# zfs unmount crashtestslave
... panic here.

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xc
fault code      = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff803a5682
stack pointer           = 0x28:0xffffff803ea09980
frame pointer           = 0x28:0xffffff803ea099b0
code segment        = base 0x0, limit 0xfffff, type 0x1b
             = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = resume, IOPL = 0
current process     = 5099 (zfs)

0xffffff002ac4a938: tag zfs, type VDIR    usecount 1, writecount 0,  
refcount 1 mountedhere 0xffffff00068be8d0    flags ()    lock type  
zfs: EXCL by thread 0xffffff0006f13390 (pid 5099)

BT:
...
#9  0xffffffff805edc42 in trap (frame=0xffffff803ea098d0) at /usr/src/ 
sys/amd64/amd64/trap.c:345
#10 0xffffffff805d36a7 in calltrap () at /usr/src/sys/amd64/amd64/ 
exception.S:223
#11 0xffffffff803a5682 in propagate_priority (td=0xffffff0027174ab0)  
at /usr/src/sys/kern/subr_turnstile.c:194
#12 0xffffffff803a64ec in turnstile_wait (ts=Variable "ts" is not  
available.
) at /usr/src/sys/kern/subr_turnstile.c:738
#13 0xffffffff80355101 in _mtx_lock_sleep (m=0xffffff002ca6d9f8,  
tid=18446742974314394512, opts=Variable "opts" is not available.
)
     at /usr/src/sys/kern/kern_mutex.c:447
#14 0xffffffff803f7893 in vfs_msync (mp=0xffffff00068be8d0, flags=1)  
at /usr/src/sys/kern/vfs_subr.c:3179
#15 0xffffffff803f0c7e in dounmount (mp=0xffffff00068be8d0, flags=0,  
td=Variable "td" is not available.
) at /usr/src/sys/kern/vfs_mount.c:1263
#16 0xffffffff803f1568 in unmount (td=0xffffff0006f13390,  
uap=0xffffff803ea09c00)
     at /usr/src/sys/kern/vfs_mount.c:1174
#17 0xffffffff805ed4cf in syscall (frame=0xffffff803ea09c90) at /usr/ 
src/sys/amd64/amd64/trap.c:984
#18 0xffffffff805d3930 in Xfast_syscall () at /usr/src/sys/amd64/amd64/ 
exception.S:364
#19 0x0000000800f4b9ac in ?? ()
Previous frame inner to this frame (corrupt stack?)

NOT the same backtrace as before (nothing after dounmount() is the  
same as the zpool export panic), and this time from zfs unmount, not  
zpool export.
I tried it again, and got another backtrace(!) - it "ends" (or begins,  
depending on your view) with propagate_priority(), turnstile_wait()  
and _mtx_lock_sleep() in both cases, though. Here's the second, which  
happened while doing the same as above - initial, stress and then  
manually zfs unmount the them. "zfs unmount crashtestslave" (the root  
fs) is what panics yet again:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xc
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff803aa722
stack pointer           = 0x28:0xffffff8000025a60
frame pointer           = 0x28:0xffffff8000025a90
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 12 (swi4: clock)

...
#8  0xffffffff805f1fcd in trap_fatal (frame=0xffffff80000259b0,  
eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:847
#9  0xffffffff805f2e22 in trap (frame=0xffffff80000259b0) at /usr/src/ 
sys/amd64/amd64/trap.c:345
#10 0xffffffff805d87c7 in calltrap () at /usr/src/sys/amd64/amd64/ 
exception.S:224
#11 0xffffffff803aa722 in propagate_priority (td=0xffffff00296ce390)  
at /usr/src/sys/kern/subr_turnstile.c:194
#12 0xffffffff803ab58c in turnstile_wait (ts=Variable "ts" is not  
available.
) at /usr/src/sys/kern/subr_turnstile.c:738
#13 0xffffffff8035a1c1 in _mtx_lock_sleep (m=0xffffffff808a1de0,  
tid=18446742974234830624, opts=Variable "opts" is not available.
)
     at /usr/src/sys/kern/kern_mutex.c:447
#14 0xffffffff8037ea92 in softclock (arg=Variable "arg" is not  
available.
) at /usr/src/sys/kern/kern_timeout.c:376
#15 0xffffffff803417b0 in intr_event_execute_handlers (p=Variable "p"  
is not available.
) at /usr/src/sys/kern/kern_intr.c:1165
#16 0xffffffff80342d1e in ithread_loop (arg=0xffffff000231e6a0) at / 
usr/src/sys/kern/kern_intr.c:1178
#17 0xffffffff8033ebb8 in fork_exit (callout=0xffffffff80342c90  
<ithread_loop>, arg=0xffffff000231e6a0,
     frame=0xffffff8000025c80) at /usr/src/sys/kern/kern_fork.c:842
#18 0xffffffff805d8c9e in fork_trampoline () at /usr/src/sys/amd64/ 
amd64/exception.S:561
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000001 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()


Note that the active process is *not* zfs this time.

Regards,
Thomas
Received on Sat Jul 11 2009 - 12:08:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:51 UTC