Re: tmpfs panic

From: Neel Natu <neelnatu_at_gmail.com> Date: Sun, 6 Jul 2014 13:49:04 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:50 UTC

Hi Steve,

On Sun, Jul 6, 2014 at 8:46 AM, Steve Wills <swills_at_freebsd.org> wrote:
> I should have noted this system is running in bhyve. Also I'm told this panic
> may be related to the fact that the system is running in bhyve.
>
> Looking at it a little more closely:
>
> (kgdb) list *__mtx_lock_sleep+0xb1
> 0xffffffff809638d1 is in __mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:431).
> 426                      * owner stops running or the state of the lock changes.
> 427                      */
> 428                     v = m->mtx_lock;
> 429                     if (v != MTX_UNOWNED) {
> 430                             owner = (struct thread *)(v & ~MTX_FLAGMASK);
> 431                             if (TD_IS_RUNNING(owner)) {
> 432                                     if (LOCK_LOG_TEST(&m->lock_object, 0))
> 433                                             CTR3(KTR_LOCK,
> 434                                                 "%s: spinning on %p held by %p",
> 435                                                 __func__, m, owner);
> (kgdb)
>
> I'm told that MTX_CONTESTED was set on the unlocked mtx and that MTX_CONTENDED
> is spuriously left behind, and to ask how lock prefix is handled in bhyve. Any
> of that make sense to anyone?
>

Regarding the lock prefix: since bhyve only supports hardware that has
nested paging, the hypervisor doesn't get in the way of instructions
that access memory. This includes instructions with lock prefixes or
any other prefixes for that matter. If there is a VM exit due to a
nested page fault then the faulting instruction is restarted after
resolving the fault.

Having said that, there are more plausible explanations that might
implicate bhyve: incorrect translations in the nested page tables,
stale translations in the TLB etc.

Do you have a core file for the panic? It would be very useful to
debug this further.

> Thanks,
> Steve
>
> On Sun, Jul 06, 2014 at 01:53:37PM +0000, Steve Wills wrote:
>> Hi,
>>
>> Just experienced this tmpfs panic on r268160:
>>
>> Freed UMA keg (TMPFS node) was not empty (16 items).  Lost 1 pages of memory.
>>
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 12; apic id = 0c
>> fault virtual address   = 0x378
>> fault code              = supervisor read data, page not present
>> instruction pointer     = 0x20:0xffffffff809638d1
>> stack pointer           = 0x28:0xfffffe07243800a0
>> frame pointer           = 0x28:0xfffffe0724380120
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                         = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags        = interrupt enabled, resume, IOPL = 0
>> current process         = 65339 (pkg-static)
>> [ thread pid 65339 tid 101641 ]
>> Stopped at      __mtx_lock_sleep+0xb1:  movl    0x378(%rax),%ecx
>> db> bt
>> Tracing pid 65339 tid 101641 td 0xfffff80286b2e490
>> __mtx_lock_sleep() at __mtx_lock_sleep+0xb1/frame 0xfffffe0724380120
>> free_unr() at free_unr+0x9d/frame 0xfffffe0724380160
>> tmpfs_free_node() at tmpfs_free_node+0xf2/frame 0xfffffe07243801a0
>> tmpfs_reclaim() at tmpfs_reclaim+0xdc/frame 0xfffffe07243801d0
>> VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xa7/frame 0xfffffe0724380200
>> vgonel() at vgonel+0x24c/frame 0xfffffe0724380280
>> vrecycle() at vrecycle+0x84/frame 0xfffffe07243802c0
>> tmpfs_inactive() at tmpfs_inactive+0x18/frame 0xfffffe07243802d0
>> VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0xa7/frame 0xfffffe0724380300
>> vinactive() at vinactive+0x181/frame 0xfffffe0724380360
>> vputx() at vputx+0x30d/frame 0xfffffe07243803d0
>> vn_close() at vn_close+0x13e/frame 0xfffffe0724380450
>> vn_closefile() at vn_closefile+0x48/frame 0xfffffe07243804d0
>> _fdrop() at _fdrop+0x29/frame 0xfffffe07243804f0
>> closef() at closef+0x2ae/frame 0xfffffe0724380580
>> fdescfree() at fdescfree+0x64c/frame 0xfffffe0724380630
>> exit1() at exit1+0x682/frame 0xfffffe07243806c0
>> sigexit() at sigexit+0x929/frame 0xfffffe0724380980
>> postsig() at postsig+0x3c4/frame 0xfffffe0724380a70
>> ast() at ast+0x487/frame 0xfffffe0724380ab0
>> doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffc6e0
>> db>
>>
>> Any further debugging I can do?
>>
>> Thanks,
>> Steve
>
>