On Sun, Jul 06, 2014 at 05:25:12PM +0000, Steve Wills wrote: > On Sun, Jul 06, 2014 at 12:28:07PM -0400, Ryan Stone wrote: > > On Sun, Jul 6, 2014 at 11:46 AM, Steve Wills <swills_at_freebsd.org> wrote: > > > I should have noted this system is running in bhyve. Also I'm told this panic > > > may be related to the fact that the system is running in bhyve. > > > > > > Looking at it a little more closely: > > > > > > (kgdb) list *__mtx_lock_sleep+0xb1 > > > 0xffffffff809638d1 is in __mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:431). > > > 426 * owner stops running or the state of the lock changes. > > > 427 */ > > > 428 v = m->mtx_lock; > > > 429 if (v != MTX_UNOWNED) { > > > 430 owner = (struct thread *)(v & ~MTX_FLAGMASK); > > > 431 if (TD_IS_RUNNING(owner)) { > > > 432 if (LOCK_LOG_TEST(&m->lock_object, 0)) > > > 433 CTR3(KTR_LOCK, > > > 434 "%s: spinning on %p held by %p", > > > 435 __func__, m, owner); > > > (kgdb) > > > > > > I'm told that MTX_CONTESTED was set on the unlocked mtx and that MTX_CONTENDED > > > is spuriously left behind, and to ask how lock prefix is handled in bhyve. Any > > > of that make sense to anyone? > > > > The mutex has both MTX_CONTESTED and MTX_UNOWNED set on it? That is a > > special sentinel value that is set on a mutex when it is destroyed > > (see MTX_DESTROYED in sys/mutex.h). If that is the case it looks like > > you've stumbled upon some kind of use-after-free in tmpfs. I doubt > > that bhyve is responsible (other than perhaps changing the timing > > around making the panic more likely to happen). > > Given the first thing seen was: > > Freed UMA keg (TMPFS node) was not empty (16 items). Lost 1 pages of memory. > > this sounds reasonable to me. > > What can I do to help find and elliminate the source of the error? The most worrying fact there is that the mutex which is creating trouble cannot be anything other but allnode_lock, from the backtrace. For this mutex to be destroyed, the unmount of the corresponding mount point must run to completion. In particular, it must get past the vflush(9) call in tmpfs_unmount(). This call reclaims all vnodes belonging to the unmounted mount point. New vnodes cannot be instantiated meantime, since insmntque(9) is blocked by the MNTK_UNMOUNT flag. That said, the backtrace indicates that we have live vnode, which is reclaimed, and also we have the mutex which is in the destroyed (?) state. My basic claim is that the two events cannot co-exist, at least, this code path was heavily exercised and most issues were fixed during several years. I cannot exclude the possibility of tmpfs/VFS screwing things up, but given the above reasoning, and the fact that this is the first appearance of the MTX_DESTROED problem for the tmpfs unmounting code, which was not changed for long time, I would at least ask some things about bhyve. I.e., I would rather first look at the locked prefix emulation then at the tmpfs.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:50 UTC