Re: "New" ZFS crash on FS (pool?) unmount/export

From: Kip Macy <kip.macy_at_gmail.com>
Date: Sat, 20 Jun 2009 01:23:02 -0700
A force unmount causes all vnodes dirty data to be flushed and then
the vnodes to be released. A regular unmount will not complete if
there are open files referencing it (hence the use of force unmount on
shutdown).  Pawel had previously disabled forced unmount because of
known "issues". I chose to enable it because it guarantees that dirty
data is flushed to disk before shutdown. It looks like we may be
reclaiming a vnode that has already been freed. I haven't seen this
issue myself, so if you can identify more specifics on how to
reproduce it would greatly increase the likelihood of my being able to
fix it in the near future.


Thanks,
Kip



On Sat, Jun 20, 2009 at 12:11 AM, Thomas Backman<serenity_at_exscape.org> wrote:
> I just ran into this tonight. Not sure exactly what triggered it - the box
> stopped responding to pings at 02:07AM and it has a cron backup job using
> zfs send/recv at 02:00, so I'm guessing it's related, even though the backup
> probably should have finished before then... Hmm. Anyway.
>
> r194478.
>
> kernel trap 12 with interrupts disabled
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x288
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff805a4989
> stack pointer           = 0x28:0xffffff803e8b57e0
> frame pointer           = 0x28:0xffffff803e8b5840
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                        = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 57514 (zpool)
> panic: from debugger
> cpuid = 0
> Uptime: 10h22m13s
> Physical memory: 2027 MB
>
> (kgdb) bt
> #0  doadump () at pcpu.h:223
> #1  0xffffffff8059c409 in boot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:419
> #2  0xffffffff8059c85c in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:575
> #3  0xffffffff801f1377 in db_panic (addr=Variable "addr" is not available.
> ) at /usr/src/sys/ddb/db_command.c:478
> #4  0xffffffff801f1781 in db_command (last_cmdp=0xffffffff80c38620,
> cmd_table=Variable "cmd_table" is not available.
> ) at /usr/src/sys/ddb/db_command.c:445
> #5  0xffffffff801f19d0 in db_command_loop () at
> /usr/src/sys/ddb/db_command.c:498
> #6  0xffffffff801f3969 in db_trap (type=Variable "type" is not available.
> ) at /usr/src/sys/ddb/db_main.c:229
> #7  0xffffffff805ce465 in kdb_trap (type=12, code=0, tf=0xffffff803e8b5730)
> at /usr/src/sys/kern/subr_kdb.c:534
> #8  0xffffffff8088715d in trap_fatal (frame=0xffffff803e8b5730, eva=Variable
> "eva" is not available.
> ) at /usr/src/sys/amd64/amd64/trap.c:847
> #9  0xffffffff80887fb2 in trap (frame=0xffffff803e8b5730) at
> /usr/src/sys/amd64/amd64/trap.c:345
> #10 0xffffffff8086e007 in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:223
> #11 0xffffffff805a4989 in _sx_xlock_hard (sx=0xffffff0043557d50,
> tid=18446742975830720512, opts=Variable "opts" is not available.
> )
>    at /usr/src/sys/kern/kern_sx.c:575
> #12 0xffffffff805a52fe in _sx_xlock (sx=Variable "sx" is not available.
> ) at sx.h:155
> #13 0xffffffff80fe2995 in zfs_freebsd_reclaim () from /boot/kernel/zfs.ko
> #14 0xffffffff808cefca in VOP_RECLAIM_APV (vop=0xffffff0043557d38,
> a=0xffffff0043557d50) at vnode_if.c:1926
> #15 0xffffffff80626f6e in vgonel (vp=0xffffff00437a7938) at vnode_if.h:830
> #16 0xffffffff8062b528 in vflush (mp=0xffffff0060f2a000, rootrefs=0,
> flags=0, td=0xffffff0061528000)
>    at /usr/src/sys/kern/vfs_subr.c:2450
> #17 0xffffffff80fdd3a8 in zfs_umount () from /boot/kernel/zfs.ko
> #18 0xffffffff8062420a in dounmount (mp=0xffffff0060f2a000,
> flags=1626513408, td=Variable "td" is not available.
> )
>    at /usr/src/sys/kern/vfs_mount.c:1287
> #19 0xffffffff80624975 in unmount (td=0xffffff0061528000,
> uap=0xffffff803e8b5c00)
>    at /usr/src/sys/kern/vfs_mount.c:1172
> #20 0xffffffff8088783f in syscall (frame=0xffffff803e8b5c90) at
> /usr/src/sys/amd64/amd64/trap.c:984
> #21 0xffffffff8086e290 in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:364
> #22 0x000000080104e49c in ?? ()
> Previous frame inner to this frame (corrupt stack?)
>
> BTW, I got a (one) "force unmount is experimental" on the console. On
> regular shutdown I usually get one per filesystem, it seems (at least 10)
> and this pool should contain exactly as many filesystems as the root pool
> since it's a copy of it. On running the backup script manually post-crash,
> though, I didn't get any.
>
> Also worth noting is that I was running DTrace all night to test its
> stability. I'm pretty sure the script was
> dtrace -n 'syscall::open:entry { _at_a[copyinstr(arg0)] = count(); }'
>
> 0 swap was used and 277700 pages (~1084 MB or 50%) RAM was free, according
> to the core.txt.
>
> Regards,
> Thomas
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>



-- 
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.

    Edmund Burke
Received on Sat Jun 20 2009 - 06:23:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:50 UTC