On 7/6/14 11:12 AM, Konstantin Belousov wrote: > On Sun, Jul 06, 2014 at 05:25:12PM +0000, Steve Wills wrote: >> On Sun, Jul 06, 2014 at 12:28:07PM -0400, Ryan Stone wrote: >>> On Sun, Jul 6, 2014 at 11:46 AM, Steve Wills <swills_at_freebsd.org> wrote: >>>> I should have noted this system is running in bhyve. Also I'm told this panic >>>> may be related to the fact that the system is running in bhyve. >>>> >>>> Looking at it a little more closely: >>>> >>>> (kgdb) list *__mtx_lock_sleep+0xb1 >>>> 0xffffffff809638d1 is in __mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:431). >>>> 426 * owner stops running or the state of the lock changes. >>>> 427 */ >>>> 428 v = m->mtx_lock; >>>> 429 if (v != MTX_UNOWNED) { >>>> 430 owner = (struct thread *)(v & ~MTX_FLAGMASK); >>>> 431 if (TD_IS_RUNNING(owner)) { >>>> 432 if (LOCK_LOG_TEST(&m->lock_object, 0)) >>>> 433 CTR3(KTR_LOCK, >>>> 434 "%s: spinning on %p held by %p", >>>> 435 __func__, m, owner); >>>> (kgdb) >>>> >>>> I'm told that MTX_CONTESTED was set on the unlocked mtx and that MTX_CONTENDED >>>> is spuriously left behind, and to ask how lock prefix is handled in bhyve. Any >>>> of that make sense to anyone? >>> The mutex has both MTX_CONTESTED and MTX_UNOWNED set on it? That is a >>> special sentinel value that is set on a mutex when it is destroyed >>> (see MTX_DESTROYED in sys/mutex.h). If that is the case it looks like >>> you've stumbled upon some kind of use-after-free in tmpfs. I doubt >>> that bhyve is responsible (other than perhaps changing the timing >>> around making the panic more likely to happen). >> Given the first thing seen was: >> >> Freed UMA keg (TMPFS node) was not empty (16 items). Lost 1 pages of memory. >> >> this sounds reasonable to me. >> >> What can I do to help find and elliminate the source of the error? > The most worrying fact there is that the mutex which is creating trouble > cannot be anything other but allnode_lock, from the backtrace. For this > mutex to be destroyed, the unmount of the corresponding mount point must > run to completion. > > In particular, it must get past the vflush(9) call in tmpfs_unmount(). > This call reclaims all vnodes belonging to the unmounted mount point. > New vnodes cannot be instantiated meantime, since insmntque(9) is > blocked by the MNTK_UNMOUNT flag. > > That said, the backtrace indicates that we have live vnode, which is > reclaimed, and also we have the mutex which is in the destroyed (?) > state. My basic claim is that the two events cannot co-exist, at least, > this code path was heavily exercised and most issues were fixed during > several years. > > I cannot exclude the possibility of tmpfs/VFS screwing things up, > but given the above reasoning, and the fact that this is the first > appearance of the MTX_DESTROED problem for the tmpfs unmounting code, > which was not changed for long time, I would at least ask some things > about bhyve. I.e., I would rather first look at the locked prefix > emulation then at the tmpfs. What about running the code with INVARIANTS + DEBUG_VFS_LOCKS and see if anything shakes out? -Alfred -- Alfred PerlsteinReceived on Sun Jul 06 2014 - 18:55:17 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:50 UTC