"New" ZFS crash on FS (pool?) unmount/export

From: Thomas Backman <serenity_at_exscape.org>
Date: Sat, 20 Jun 2009 09:11:34 +0200
I just ran into this tonight. Not sure exactly what triggered it - the  
box stopped responding to pings at 02:07AM and it has a cron backup  
job using zfs send/recv at 02:00, so I'm guessing it's related, even  
though the backup probably should have finished before then... Hmm.  
Anyway.

r194478.

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x288
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff805a4989
stack pointer           = 0x28:0xffffff803e8b57e0
frame pointer           = 0x28:0xffffff803e8b5840
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 57514 (zpool)
panic: from debugger
cpuid = 0
Uptime: 10h22m13s
Physical memory: 2027 MB

(kgdb) bt
#0  doadump () at pcpu.h:223
#1  0xffffffff8059c409 in boot (howto=260) at /usr/src/sys/kern/ 
kern_shutdown.c:419
#2  0xffffffff8059c85c in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:575
#3  0xffffffff801f1377 in db_panic (addr=Variable "addr" is not  
available.
) at /usr/src/sys/ddb/db_command.c:478
#4  0xffffffff801f1781 in db_command (last_cmdp=0xffffffff80c38620,  
cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
#5  0xffffffff801f19d0 in db_command_loop () at /usr/src/sys/ddb/ 
db_command.c:498
#6  0xffffffff801f3969 in db_trap (type=Variable "type" is not  
available.
) at /usr/src/sys/ddb/db_main.c:229
#7  0xffffffff805ce465 in kdb_trap (type=12, code=0,  
tf=0xffffff803e8b5730) at /usr/src/sys/kern/subr_kdb.c:534
#8  0xffffffff8088715d in trap_fatal (frame=0xffffff803e8b5730,  
eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:847
#9  0xffffffff80887fb2 in trap (frame=0xffffff803e8b5730) at /usr/src/ 
sys/amd64/amd64/trap.c:345
#10 0xffffffff8086e007 in calltrap () at /usr/src/sys/amd64/amd64/ 
exception.S:223
#11 0xffffffff805a4989 in _sx_xlock_hard (sx=0xffffff0043557d50,  
tid=18446742975830720512, opts=Variable "opts" is not available.
)
     at /usr/src/sys/kern/kern_sx.c:575
#12 0xffffffff805a52fe in _sx_xlock (sx=Variable "sx" is not available.
) at sx.h:155
#13 0xffffffff80fe2995 in zfs_freebsd_reclaim () from /boot/kernel/ 
zfs.ko
#14 0xffffffff808cefca in VOP_RECLAIM_APV (vop=0xffffff0043557d38,  
a=0xffffff0043557d50) at vnode_if.c:1926
#15 0xffffffff80626f6e in vgonel (vp=0xffffff00437a7938) at vnode_if.h: 
830
#16 0xffffffff8062b528 in vflush (mp=0xffffff0060f2a000, rootrefs=0,  
flags=0, td=0xffffff0061528000)
     at /usr/src/sys/kern/vfs_subr.c:2450
#17 0xffffffff80fdd3a8 in zfs_umount () from /boot/kernel/zfs.ko
#18 0xffffffff8062420a in dounmount (mp=0xffffff0060f2a000,  
flags=1626513408, td=Variable "td" is not available.
)
     at /usr/src/sys/kern/vfs_mount.c:1287
#19 0xffffffff80624975 in unmount (td=0xffffff0061528000,  
uap=0xffffff803e8b5c00)
     at /usr/src/sys/kern/vfs_mount.c:1172
#20 0xffffffff8088783f in syscall (frame=0xffffff803e8b5c90) at /usr/ 
src/sys/amd64/amd64/trap.c:984
#21 0xffffffff8086e290 in Xfast_syscall () at /usr/src/sys/amd64/amd64/ 
exception.S:364
#22 0x000000080104e49c in ?? ()
Previous frame inner to this frame (corrupt stack?)

BTW, I got a (one) "force unmount is experimental" on the console. On  
regular shutdown I usually get one per filesystem, it seems (at least  
10) and this pool should contain exactly as many filesystems as the  
root pool since it's a copy of it. On running the backup script  
manually post-crash, though, I didn't get any.

Also worth noting is that I was running DTrace all night to test its  
stability. I'm pretty sure the script was
dtrace -n 'syscall::open:entry { _at_a[copyinstr(arg0)] = count(); }'

0 swap was used and 277700 pages (~1084 MB or 50%) RAM was free,  
according to the core.txt.

Regards,
Thomas
Received on Sat Jun 20 2009 - 05:11:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:50 UTC