Re: ZFS panic with r255937

From: Keith White <kwhite_at_site.uottawa.ca>
Date: Thu, 3 Oct 2013 06:54:31 -0400 (EDT)
On Thu, 3 Oct 2013, Andriy Gapon wrote:

> on 02/10/2013 20:59 Keith White said the following:
>> On Wed, 2 Oct 2013, Andriy Gapon wrote:
>>
>>> on 30/09/2013 02:11 kwhite_at_site.uottawa.ca said the following:
>>>> Sorry, debugging this is *way* beyond me.  Any hints, patches to try?
>>>
>>> Please share the stack trace.
>>>
>>> --
>>> Andriy Gapon
>>
>> There's now a pr for this panic: kern/182570
>>
>> Here's the stack trace:
>>
>> root_at_freebsd10:/usr/src # kgdb /boot/kernel/kernel /var/crash/vmcore.last
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>>
>> Unread portion of the kernel message buffer:
>> panic: solaris assert: dn->dn_maxblkid == 0 &&
>> (BP_IS_HOLE(&dn->dn_phys->dn_blkptr[0]) || dnode_block_freed(dn, 0)), file:
>> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c,
>> line: 598
>> cpuid = 1
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00992b3280
>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00992b3330
>> vpanic() at vpanic+0x126/frame 0xfffffe00992b3370
>> panic() at panic+0x43/frame 0xfffffe00992b33d0
>> assfail() at assfail+0x22/frame 0xfffffe00992b33e0
>> dnode_reallocate() at dnode_reallocate+0x225/frame 0xfffffe00992b3430
>> dmu_object_reclaim() at dmu_object_reclaim+0x123/frame 0xfffffe00992b3480
>> dmu_recv_stream() at dmu_recv_stream+0xd79/frame 0xfffffe00992b36b0
>> zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfffffe00992b3920
>> zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfffffe00992b39c0
>> devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfffffe00992b3a20
>> kern_ioctl() at kern_ioctl+0x2ca/frame 0xfffffe00992b3a90
>> sys_ioctl() at sys_ioctl+0x11f/frame 0xfffffe00992b3ae0
>> amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00992b3bf0
>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00992b3bf0
>
>
> Thank you very much.
> To me this looks very similar to a problem discovered and fixed in illumos some
> time ago.  Please check if the following change improves the situation for you.
>
> https://github.com/avg-I/freebsd/commit/a7e7dece215bc5d00077e9c7f4db34d9e5c30659
>
> Raw:
> https://github.com/avg-I/freebsd/commit/a7e7dece215bc5d00077e9c7f4db34d9e5c30659.patch
> ...

Yes, it does.  send/recv completes with no panic.  That patch fixes kern/182570 for me.

Thanks!

...keith

Once the patch is applied "svn diff" gives me:

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c	(revision 255986)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c	(working copy)
_at__at_ -677,6 +677,16 _at__at_
  	if (err != 0)
  		return (err);
  	err = dmu_free_long_range_impl(os, dn, offset, length);
+
+	/*
+	 * It is important to zero out the maxblkid when freeing the entire
+	 * file, so that (a) subsequent calls to dmu_free_long_range_impl()
+	 * will take the fast path, and (b) dnode_reallocate() can verify
+	 * that the entire file has been freed.
+	 */
+	if (offset == 0 && length == DMU_OBJECT_END)
+		dn->dn_maxblkid = 0;
+
  	dnode_rele(dn, FTAG);
  	return (err);
  }
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
===================================================================
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c	(revision 255986)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c	(working copy)
_at__at_ -616,7 +616,7 _at__at_
  	 */
  	if (dn->dn_datablkshift == 0) {
  		if (off != 0 || len < dn->dn_datablksz)
-			dmu_tx_count_write(txh, off, len);
+			dmu_tx_count_write(txh, 0, dn->dn_datablksz);
  	} else {
  		/* first block will be modified if it is not aligned */
  		if (!IS_P2ALIGNED(off, 1 << dn->dn_datablkshift))
Received on Thu Oct 03 2013 - 08:54:28 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:42 UTC