Re: kernel panic caused by virtualbox(?)

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Thu, 11 Aug 2016 11:06:35 +0300
On Wed, Aug 10, 2016 at 04:47:15PM -0700, Don Lewis wrote:
> On 10 Aug, Jung-uk Kim wrote:
> > On 08/09/16 05:12 AM, Konstantin Belousov wrote:
> >> On Mon, Aug 08, 2016 at 04:44:20PM -0700, Don Lewis wrote:
> >>> On  8 Aug, Konstantin Belousov wrote:
> >>>> On Mon, Aug 08, 2016 at 10:22:44AM -0700, John Baldwin wrote:
> >>>>> On Thursday, August 04, 2016 05:10:29 PM Don Lewis wrote:
> >>>>>> Reposted to -current to get some more eyes on this ...
> >>>>>>
> >>>>>> I just got a kernel panic when I started up a CentOS 7 VM in virtualbox.
> >>>>>> The host is:
> >>>>>> 	FreeBSD 12.0-CURRENT #17 r302500 GENERIC amd64
> >>>>>> The virtualbox version is:
> >>>>>> 	virtualbox-ose-5.0.26
> >>>>>> 	virtualbox-ose-kmod-5.0.26_1
> >>>>>>
> >>>>>> The panic message is:
> >>>>>>
> >>>>>> panic: Unregistered use of FPU in kernel
> >>>>>> cpuid = 1
> >>>>>> KDB: stack backtrace:
> >>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe085a55d030
> >>>>>> vpanic() at vpanic+0x182/frame 0xfffffe085a55d0b0
> >>>>>> kassert_panic() at kassert_panic+0x126/frame 0xfffffe085a55d120
> >>>>>> trap() at trap+0x7ae/frame 0xfffffe085a55d330
> >>>>>> calltrap() at calltrap+0x8/frame 0xfffffe085a55d330
> >>>>>> --- trap 0x16, rip = 0xffffffff827dd3a9, rsp = 0xfffffe085a55d408, rbp = 0xfffffe085a55d430 ---
> >>>>>> g_pLogger() at 0xffffffff827dd3a9/frame 0xfffffe085a55d430
> >>>>>> g_pLogger() at 0xffffffff8274e5c7/frame 0x3
> >>>>>> KDB: enter: panic
> >>>>>>
> >>>>>> Since g_pLogger is a symbol in vboxdrv.ko, it looks like virtualbox is
> >>>>>> the trigger.
> >>>>>>
> >>>>>> There are no symbols for the virtualbox kmods, possibly because I
> >>>>>> installed them as an upgrade using packages (built with the same source
> >>>>>> tree version) instead of by using PORTS_MODULES in make.conf, so ports
> >>>>>> kgdb didn't have anything useful to say about what happened before the
> >>>>>> trap.
> >>>>>>
> >>>>>> This panic is very repeatable.  I just got another one when starting the
> >>>>>> same VM., but this time the two calls before the trap were
> >>>>>> null_bug_bypass().  Hmn, that symbol is in nullfs ...
> >>>>>>
> >>>>>> I don't see this with a Windows 7 VM.
> >>>>>>
> >>>>>> All of the virtualbox kmod files are compiled with -mno-mmx -mno-sse
> >>>>>> -msoft-float -mno-aes -mno-avx
> >>>> Your disassemble listed fxrstor instruction that failing, or did I
> >>>> mis-remembered ? This is most likely some context switch code, either
> >>>> by virtual machine or erronously executed guest code. It is not a
> >>>> spontaneous use of FPU, but more likely something different. Can you
> >>>> confirm ?
> >>>>
> >>>> In either case, I do not remember any KBI changes around PCB layout or
> >>>> fpu_enter() KPI recently.
> >>>>
> >>>>>
> >>>>> I suspect head packages are quite likely built against the a "wrong" KBI
> >>>>> and are too fragile to use for kmods vs compiling from ports. :-/  I would
> >>>>> try a built-from-ports kmod to see if the panics go away.
> >>>>
> >>>> FWIW, I will commit the following change shortly. Since third-party
> >>>> modules break the invariant, either due to bugs (ndis wrappers) or
> >>>> possibly due to KBI breakage, it is worth to have the detection enabled
> >>>> for production kernels.
> >>>
> >>> Interesting ... I tried running virtualbox on recent 10.3-STABLE with a
> >>> GENERIC kernel and the guest seemed to operate properly.  Then I enabled
> >>> INVARIANTS and got the panic.  I suspect that is why nobody has stumbled
> >>> across this before.
> >>>
> >> This is yet another reason to promote KASSERT to the full panic.
> >> I expect that the vbox source lacks fpu_kern_enter() calls around the
> >> FPU state restoration.
> > 
> > Unfortunately, the code is in MI source as it is unnecessary for
> > supported OSes (read: FreeBSD is not supported) and it's not easy to
> > inject fpu_kern_enter()/fpu_kern_leave() calls there. :-(
> 
> It's a headache, but our ports can use patch files for that sort of
> thing ...

Note that it is, most likely, completely useless to wrap single
FXRSTOR instruction into the fpu_kern_enter() braces.  The purpose of
the instruction is to load ('legacy', as they call it, no AVX+) FPU state
into the machine context.  If you put fpu_kern_leave() right after
the instruction, the context is flushed.

There must be some larger scope where the braces do make sense.  And since
some other OSes do require similar precautions around the in-kernel FPU
access, I suspect that there should be some common place to put our KPI
calls.
Received on Thu Aug 11 2016 - 06:06:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:07 UTC