Re: kernel panic caused by virtualbox(?)

From: Don Lewis <truckman_at_FreeBSD.org> Date: Wed, 10 Aug 2016 16:47:15 -0700 (PDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:07 UTC

On 10 Aug, Jung-uk Kim wrote:
> On 08/09/16 05:12 AM, Konstantin Belousov wrote:
>> On Mon, Aug 08, 2016 at 04:44:20PM -0700, Don Lewis wrote:
>>> On  8 Aug, Konstantin Belousov wrote:
>>>> On Mon, Aug 08, 2016 at 10:22:44AM -0700, John Baldwin wrote:
>>>>> On Thursday, August 04, 2016 05:10:29 PM Don Lewis wrote:
>>>>>> Reposted to -current to get some more eyes on this ...
>>>>>>
>>>>>> I just got a kernel panic when I started up a CentOS 7 VM in virtualbox.
>>>>>> The host is:
>>>>>> 	FreeBSD 12.0-CURRENT #17 r302500 GENERIC amd64
>>>>>> The virtualbox version is:
>>>>>> 	virtualbox-ose-5.0.26
>>>>>> 	virtualbox-ose-kmod-5.0.26_1
>>>>>>
>>>>>> The panic message is:
>>>>>>
>>>>>> panic: Unregistered use of FPU in kernel
>>>>>> cpuid = 1
>>>>>> KDB: stack backtrace:
>>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe085a55d030
>>>>>> vpanic() at vpanic+0x182/frame 0xfffffe085a55d0b0
>>>>>> kassert_panic() at kassert_panic+0x126/frame 0xfffffe085a55d120
>>>>>> trap() at trap+0x7ae/frame 0xfffffe085a55d330
>>>>>> calltrap() at calltrap+0x8/frame 0xfffffe085a55d330
>>>>>> --- trap 0x16, rip = 0xffffffff827dd3a9, rsp = 0xfffffe085a55d408, rbp = 0xfffffe085a55d430 ---
>>>>>> g_pLogger() at 0xffffffff827dd3a9/frame 0xfffffe085a55d430
>>>>>> g_pLogger() at 0xffffffff8274e5c7/frame 0x3
>>>>>> KDB: enter: panic
>>>>>>
>>>>>> Since g_pLogger is a symbol in vboxdrv.ko, it looks like virtualbox is
>>>>>> the trigger.
>>>>>>
>>>>>> There are no symbols for the virtualbox kmods, possibly because I
>>>>>> installed them as an upgrade using packages (built with the same source
>>>>>> tree version) instead of by using PORTS_MODULES in make.conf, so ports
>>>>>> kgdb didn't have anything useful to say about what happened before the
>>>>>> trap.
>>>>>>
>>>>>> This panic is very repeatable.  I just got another one when starting the
>>>>>> same VM., but this time the two calls before the trap were
>>>>>> null_bug_bypass().  Hmn, that symbol is in nullfs ...
>>>>>>
>>>>>> I don't see this with a Windows 7 VM.
>>>>>>
>>>>>> All of the virtualbox kmod files are compiled with -mno-mmx -mno-sse
>>>>>> -msoft-float -mno-aes -mno-avx
>>>> Your disassemble listed fxrstor instruction that failing, or did I
>>>> mis-remembered ? This is most likely some context switch code, either
>>>> by virtual machine or erronously executed guest code. It is not a
>>>> spontaneous use of FPU, but more likely something different. Can you
>>>> confirm ?
>>>>
>>>> In either case, I do not remember any KBI changes around PCB layout or
>>>> fpu_enter() KPI recently.
>>>>
>>>>>
>>>>> I suspect head packages are quite likely built against the a "wrong" KBI
>>>>> and are too fragile to use for kmods vs compiling from ports. :-/  I would
>>>>> try a built-from-ports kmod to see if the panics go away.
>>>>
>>>> FWIW, I will commit the following change shortly. Since third-party
>>>> modules break the invariant, either due to bugs (ndis wrappers) or
>>>> possibly due to KBI breakage, it is worth to have the detection enabled
>>>> for production kernels.
>>>
>>> Interesting ... I tried running virtualbox on recent 10.3-STABLE with a
>>> GENERIC kernel and the guest seemed to operate properly.  Then I enabled
>>> INVARIANTS and got the panic.  I suspect that is why nobody has stumbled
>>> across this before.
>>>
>> This is yet another reason to promote KASSERT to the full panic.
>> I expect that the vbox source lacks fpu_kern_enter() calls around the
>> FPU state restoration.
> 
> Unfortunately, the code is in MI source as it is unnecessary for
> supported OSes (read: FreeBSD is not supported) and it's not easy to
> inject fpu_kern_enter()/fpu_kern_leave() calls there. :-(

It's a headache, but our ports can use patch files for that sort of
thing ...