on 10/07/2013 19:50 Adrian Chadd said the following: > On 9 July 2013 23:27, Andriy Gapon <avg_at_freebsd.org> wrote: >> on 09/07/2013 16:03 Adrian Chadd said the following: >>> Does anyone have any ideas as to what's going on? >> >> Please provide output of 'thread apply all bt' from kgdb, then perhaps someone >> might be able to tell. > > Done - http://people.freebsd.org/~adrian/ath/20130710-vm0-zfs-hang.txt vmcore.0 was useless for some reason - an interesting address was not accessible. vmcore.1 seems to be very similar and is actually useful. This problem looks like an interesting deadlock involving ZFS and VFS and vnode shortage. The most obvious things are that many threads could not allocate a new vnode and are waiting in getnewvnode_reserve and also many threads are stuck waiting on vnode locks held by the former threads. In effect, they all wait for vnlru, which in turn is stuck in zfs_freebsd_reclaim on z_teardown_lock. That lock is held by a thread doing a rollback ioctl. And that thread waits for zfs sync thread to actually perform the rollback. The sync thread waits on zfs quiesce thread to declare the current transaction group as quiesced. The quiesce thread, obviously, waits for all operations running in the current transaction group to complete. Some of those operations are e.g. VOP_CREATE -> zfs_create. They already started a zfs transaction (as a part of the current transaction group) and they execute zfs_mknode which needs a new vnode. So these threads are waiting for a new vnode and do not let the current transaction group become quiesced. GOTO beginning. Compressing the above description to the extreme, it boils down to: ZFS needs a new vnode from vnlru and is waiting on it, while vnlru has to wait on ZFS. -- Andriy GaponReceived on Tue Jul 16 2013 - 17:33:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:39 UTC