Re: ZFS panic with concurrent recv and read-heavy workload

From: Marius Strobl <marius_at_alchemy.franken.de>
Date: Wed, 8 Jun 2011 23:14:27 +0200
On Fri, Jun 03, 2011 at 03:03:56AM -0400, Nathaniel W Filardo wrote:
> I just got this on another machine, no heavy workload needed, just booting
> and starting some jails.  Of interest, perhaps, both this and the machine
> triggering the below panic are SMP V240s with 1.5GHz CPUs (though I will
> confess that the machine in the original report may have had bad RAM).  I
> have run a UP 1.2GHz V240 for months and never seen this panic.
> 
> This time the kernel is
> > FreeBSD 9.0-CURRENT #9: Fri Jun  3 02:32:13 EDT 2011
> csup'd immediately before building.  The full panic this time is
> > panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked _at_
> > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4659
> >
> > cpuid = 1
> > KDB: stack backtrace:
> > panic() at panic+0x1c8
> > _sx_assert() at _sx_assert+0xc4
> > _sx_xunlock() at _sx_xunlock+0x98
> > l2arc_feed_thread() at l2arc_feed_thread+0xeac
> > fork_exit() at fork_exit+0x9c
> > fork_trampoline() at fork_trampoline+0x8
> >
> > SC Alert: SC Request to send Break to host.
> > KDB: enter: Line break on console
> > [ thread pid 27 tid 100121 ]
> > Stopped at      kdb_enter+0x80: ta              %xcc, 1
> > db> reset
> > ttiimmeeoouutt  sshhuuttttiinngg  ddoowwnn  CCPPUUss..
> 
> Half of the memory in this machine is new (well, came with the machine) and
> half is from the aforementioned UP V240 which seemed to work fine (I was
> attempting an upgrade when this happened); none of it (or indeed any of the
> hardware save the disk controller and disks) are common between this and the
> machine reporting below.
> 
> Thoughts?  Any help would be greatly appreciated.
> Thanks.
> --nwf;
> 
> On Wed, Apr 06, 2011 at 04:00:43AM -0400, Nathaniel W Filardo wrote:
> >[...]
> > panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked _at_ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869
> >
> > cpuid = 1
> > KDB: stack backtrace:
> > panic() at panic+0x1c8
> > _sx_assert() at _sx_assert+0xc4
> > _sx_xunlock() at _sx_xunlock+0x98
> > arc_evict() at arc_evict+0x614
> > arc_get_data_buf() at arc_get_data_buf+0x360
> > arc_buf_alloc() at arc_buf_alloc+0x94
> > dmu_buf_will_fill() at dmu_buf_will_fill+0xfc
> > dmu_write() at dmu_write+0xec
> > dmu_recv_stream() at dmu_recv_stream+0x8a8
> > zfs_ioc_recv() at zfs_ioc_recv+0x354
> > zfsdev_ioctl() at zfsdev_ioctl+0xe0
> > devfs_ioctl_f() at devfs_ioctl_f+0xe8
> > kern_ioctl() at kern_ioctl+0x294
> > ioctl() at ioctl+0x198
> > syscallenter() at syscallenter+0x270
> > syscall() at syscall+0x74
> > -- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 --
> > userland() at 0x40e72cc8
> > user trace: trap %o7=0x40c13e24
> > pc 0x40e72cc8, sp 0x7fdffff4641
> > pc 0x40c158f4, sp 0x7fdffff4721
> > pc 0x40c1e878, sp 0x7fdffff47f1
> > pc 0x40c1ce54, sp 0x7fdffff8b01
> > pc 0x40c1dbe0, sp 0x7fdffff9431
> > pc 0x40c1f718, sp 0x7fdffffd741
> > pc 0x10731c, sp 0x7fdffffd831
> > pc 0x10c90c, sp 0x7fdffffd8f1
> > pc 0x103ef0, sp 0x7fdffffe1d1
> > pc 0x4021aff4, sp 0x7fdffffe291
> > done
> >[...]

Apparently this is a locking issue in the ARC code, the ZFS people should
be able to help you.

Marius
Received on Wed Jun 08 2011 - 19:29:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:14 UTC