Re: ZFS : panic("sleeping thread")

From: Thomas Backman <serenity_at_exscape.org>
Date: Thu, 18 Jun 2009 13:49:46 +0200
On May 27, 2009, at 07:58 PM, Artem Belevich wrote:

> Hi,
>
> While recent ZFS improvements got rid of random hangs I used to see,
> there's still one problem that I keep running into -- panic in ZFS
> under heavy load. I can reproduce it by doing a build with -j16 in a
> jail running i386 binaries on -CURRENT/amd64 running on a box with
> quad-core CPU. It takes a while to reproduce, but it usually shows up
> within couple of hours.
>
> Sleeping thread (tid 100606, pid 32147) owns a non-sleepable lock
> sched_switch() at sched_switch+0xed
> mi_switch() at mi_switch+0x16f
> sleepq_wait() at sleepq_wait+0x42
> _sx_xlock_hard() at _sx_xlock_hard+0x1f0
> _sx_xlock() at _sx_xlock+0x4e
> rrw_exit() at rrw_exit+0x1d
> zfs_freebsd_getattr() at zfs_freebsd_getattr+0x2be
> VOP_GETATTR_APV() at VOP_GETATTR_APV+0x44
> filt_vfsread() at filt_vfsread+0x51
> knote() at knote+0xc2
> VOP_WRITE_APV() at VOP_WRITE_APV+0x11f
> vn_write() at vn_write+0x279
> dofilewrite() at dofilewrite+0x85
> kern_writev() at kern_writev+0x60
> write() at write+0x54
> ia32_syscall() at ia32_syscall+0x236
> Xint0x80_syscall() at Xint0x80_syscall+0x85
> --- syscall (4, FreeBSD ELF32, write), rip = 0x78162153, rsp =
> 0xffff945c, rbp = 0xffff9478 ---
>
> It appears that locking within ZFS conflicts with vnode locking. The
> back-trace is always the same.
>
> For now, I've applied following patch to disable the panic, but it
> would be good if someone familiar with VFS locking in FreeBSD could
> take a look.
> If you need any additional info, let me know.
>
> Thanks,
> --Artem
>
> diff -r 930d975c8103 src/sys/kern/subr_turnstile.c
> --- a/sys/kern/subr_turnstile.c	Fri Dec 05 16:12:43 2008 -0800
> +++ b/sys/kern/subr_turnstile.c	Fri Dec 12 14:31:16 2008 -0800
> _at__at_ -219,7 +219,10 _at__at_
> #ifdef DDB
> 			db_trace_thread(td, -1);
> #endif
> -			panic("sleeping thread");
> +                        /* Don't propagate priority to a sleeping  
> thread. */
> +			thread_unlock(td);
> +			return;
> +			// panic("sleeping thread");
> 		}
>
> 		/*
Anyone have any updates on this? I just got a "sleeping thread" panic  
in ZFS after doing a zfs rollback. Unfortunately, "panic" in the  
debugger resulted in "dump device too small" (despite being RAM-sized)  
so I don't have a BT... However the BT I got in the debugger was *not*  
the same as yours. There was no _sx_xlock in it, but that's pretty  
much all I know about it. :(

Regards,
Thomas
Received on Thu Jun 18 2009 - 09:50:08 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:50 UTC