Re: Yet another ZFS recv panic; old but rarely seen

From: Thomas Backman <serenity_at_exscape.org> Date: Fri, 21 Aug 2009 11:47:35 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:54 UTC

On Aug 21, 2009, at 08:51, Thomas Backman wrote:

> Ugh. Bad news again: another zfs send/recv panic during an  
> incremental backup.
>
> Unread portion of the kernel message buffer:
> panic: dirtying dbuf obj=b213 lvl=1 blkid=2 but not tx_held
>
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> panic() at panic+0x182
> dmu_tx_dirty_buf() at dmu_tx_dirty_buf+0x28f
> dbuf_dirty() at dbuf_dirty+0x69
> dnode_free_range() at dnode_free_range+0x80d
> dnode_reallocate() at dnode_reallocate+0x131
> dmu_object_reclaim() at dmu_object_reclaim+0x99
> dmu_recv_stream() at dmu_recv_stream+0x1446
> zfs_ioc_recv() at zfs_ioc_recv+0x25a
> zfsdev_ioctl() at zfsdev_ioctl+0x8a
> devfs_ioctl_f() at devfs_ioctl_f+0x77
> kern_ioctl() at kern_ioctl+0xf6ioctl() at ioctl+0xfd
> syscall() at syscall+0x28f
> Xfast_syscall() at Xfast_syscall+0xe1
> --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x800fe5f7c, rsp =  
> 0x7fffffff8fb8, rbp = 0x7fffffff9cf0 ---
> KDB: enter: panic
> panic: from debugger
> cpuid = 0
> Uptime: 4h52m26s
>
> Looks *eerily* similar to this panic fron OpenSolaris: http://mail.opensolaris.org/pipermail/zfs-code/2008-September/000694.html
>
> GDB backtrace isn't of that much more use, I guess:
> #11 0xffffffff8036d02b in panic (fmt=Variable "fmt" is not available.
> )
>    at /usr/src/sys/kern/kern_shutdown.c:562
> #12 0xffffffff80b4765f in dmu_tx_dirty_buf () from /boot/kernel/zfs.ko
> #13 0xffffffff80b3a519 in dbuf_dirty () from /boot/kernel/zfs.ko
> #14 0xffffffff80b4b68d in dnode_free_range () from /boot/kernel/zfs.ko
> #15 0xffffffff80b4c461 in dnode_reallocate () from /boot/kernel/zfs.ko
> #16 0xffffffff80b42569 in dmu_object_reclaim () from /boot/kernel/ 
> zfs.ko
> #17 0xffffffff80b421b6 in dmu_recv_stream () from /boot/kernel/zfs.ko
> #18 0xffffffff80ba430a in zfs_ioc_recv () from /boot/kernel/zfs.ko
> #19 0xffffff002ac13d68 in ?? ()
> #20 0xffffff002aa6c320 in ?? ()
> #21 0xffffff002ae15000 in ?? ()
> #22 0xffffff0002891400 in ?? ()
> #23 0xffffff00028f2800 in ?? ()
> #24 0xffffff00744a1ab8 in ?? ()
> ...
> #34 0xffffff803e7fc860 in ?? ()
> #35 0xffffffff805b699f in uma_zalloc_arg (zone=0xffffff00183c6600,
>    udata=0xffffff00744a1000, flags=-128) at /usr/src/sys/vm/ 
> uma_core.c:1990
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)
>
> Apparently, I've gotten this once before, at r195910 (+ patches, not  
> such which ones at that time), on July 30th. Same DDB backtrace,  
> same broken GDB backtrace.
>
> Regards,
> Thomas

I found some more info mere minutes after posting this (figures;  
that's why I prefer media where you can edit your posts!), but had  
other things to do. So, here's some more:

OpenSolaris bug ID: 6754448 ( http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6754448 
  )
Fixed in build 108: http://dlc.sun.com/osol/on/downloads/b108/on-changelog-b108.html
Changelogs are to be found on that page (just search for "6754448",  
with a history/diff link on each source file's page. Unfortunately  
(unless FreeBSD suffers from both, that is), they apparently fixed two  
bugs in the same batch, making it harder - at least for *me* - to see  
what changes relate to *this* panic.
Still, I'm guessing this will help, unless the code is too much out of  
sync with OpenSolaris.
I'm also guessing Pawel already knows waaaaaaay more about their  
system than I do (... which is about nothing), so I'll probably shut  
up now... ;)

Regards,
Thomas