Re: Fwd: Re: r365488 page faults on AMD Ryzen 9 3950X

From: Rainer Hurling <rhurlin_at_gwdg.de> Date: Mon, 21 Sep 2020 20:57:46 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC

On 20.09.20 22:35, Rainer Hurling wrote:
> On 20.09.20 22:07, Konstantin Belousov wrote:
>> On Sun, Sep 20, 2020 at 10:55:26PM +0300, Konstantin Belousov wrote:
>>> On Sun, Sep 20, 2020 at 03:11:26PM +0200, Rainer Hurling wrote:
>>>> Am 20.09.20 um 11:38 schrieb Konstantin Belousov:
>>>>> On Sun, Sep 20, 2020 at 10:26:11AM +0200, Rainer Hurling wrote:
>>>>>> Am 20.09.20 um 10:20 schrieb Hans Petter Selasky:
>>>>>>> On 2020-09-20 10:05, Rainer Hurling wrote:
>>>>>>>> Hi monochrome,
>>>>>>>>
>>>>>>>> back to keyboard, it tried newest CURRENT (r365920) on my box and even
>>>>>>>> with newest sources the error occurs.
>>>>>>>>
>>>>>>>> After looking around somewhat more, I found some hints about Virtualbox
>>>>>>>> kernel module having problems with r365488. Unfortunately, I am not able
>>>>>>>> to find the thread again :(
>>>>>>>>
>>>>>>>> What seems to help as a workaround is to disable the loading of
>>>>>>>> VirtualBox in /boot/loader.conf
>>>>>>>>
>>>>>>>> #vboxdrv_load="YES"
>>>>>>>>
>>>>>>>> and in /etc/rc.conf
>>>>>>>>
>>>>>>>> #vboxnet_enable="YES"
>>>>>>>> #vboxguest_enable="YES"
>>>>>>>>
>>>>>>>>
>>>>>>>> So probably, this page fault is not restricted to AMD Ryzen?
>>>>>>>>
>>>>>>>
>>>>>>> Possibly you need to rebuild that kernel module. Maybe the FreeBSD
>>>>>>> version was not bumped correctly.
>>>>>>>
>>>>>>> --HPS
>>>>>>>
>>>>>>
>>>>>> Thanks for the hint. But I did rebuild all kernel modules before
>>>>>> rebooting, in my case vbox*.ko, nvidia*.ko.
>>>>>
>>>>> Provide backtrace of the panic.
>>>>>
>>>>
>>>> Hi Konstantin,
>>>>
>>>> Thanks for your response.
>>>>
>>>> After trying several ways to produce a core dump or a working kdb prompt
>>>> without success, all I can offer is the following screen contents. I
>>>> built a GENERIC kernel with debugging enabled, enable loading of vboxdrv
>>>> via /boot/loader.conf and /etc/rc.conf as described above:
>>>>
>>>>
>>>> [..snip..]
>>>> procfs registered
>>>> modulte_register_init: MOD_LOAD (tmpfs, 0xffffffff80caa060,
>>>> 0xffffffff82520a70) error 17
>>>> Timecounters tick every 1.000 msec
>>>> lo0: bpf attached
>>>> vlan: initialized, using hash tables with chaining
>>>>
>>>>
>>>> Fatal trap 12: page fault while in kernel mode
>>>> cpuid = 31; apic id = 1f
>>>> fault virtual address   = 0x0
>>>> fault code              = supervisor read data, page not present
>>>> instruction pointer     = 0x20:0xffffffff80ea889b
>>>> stack pointer           = 0x20:0xffffffff826017e0
>>>> frame pointer           = 0x20:0xffffffff826017e0
>>>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>>>                         = DPL 0, pres 1, long 1, def32 0, gran 1
>>>> processor eflags        = interrupt enabled, resume, IOPL = 0
>>>> current process         = 0 (swapper)
>>>> trap number             = 12
>>>> panic: page fault
>>>> cpuid = 31
>>>> time = 1
>>>> KDB: stack backtrace:
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>>> 0xffffffff82601490
>>>> vpanic() at vpanic+0x182/frame 0xffffffff826014e0
>>>> panic() at panic+0x43/frame 0xffffffff82601540
>>>> trap_fatal() at trap_fatal+0x387/frame 0xffffffff826015a0
>>>> trap_pfault() at trap_pfault+0x97/frame 0xffffffff82601600
>>>> calltrap() at calltrap+0x8/frame 0xffffffff82601710
>>>> --- trap 0xc, rip = 0xffffffff80ea889b, rsp = 0xffffffff826017e0, rbp =
>>>> 0xffffffff826017e0 ---
>>>> phys_pager_getpages() at phys_pager_getpages+0xb/frame 0xffffffff826017e0
>>>> vm_pager_get_pages() at vm_pager_get_pages+0x4f/frame 0xffffffff82601830
>>>> vm_fault() at vm_fault+0x5d6/frame 0xffffffff82601940
>>>> vm_map_wire_locked() at vm_map_wire_locked+0x3a6/framw 0xffffffff826019f0
>>>> vm_map_wire() at vm_map_wire+0x6b/frame 0xffffffff82601a20
>>>> rtR0MemObjFreeBSDAllocHelper() at
>>>> rtR0MemObjFreeBSDAllocHelper+0xdc/frame 0xffffffff82601a70
>>>> rtR0MemObjNativeAllocCont() at rtR0MemObjNativeAllocCont+0x50/frame
>>>> 0xffffffff82601ac0
>>>> supdrvGipCreate() at supdrvGipCreate+0x97/frame 0xffffffff82601b60
>>>> supdrvInitDevExt() at supdrvInitDevExt+0x19a/frame 0xffffffff82601bd0
>>>> VBoxDrvFreeBSDModuleEvent() at VBoxDrvFreeBSDModuleEvent+0x46/frame
>>>> 0xffffffff82601bf0
>>>> module_register_init() at module_register_init+0xbd/frame 0xffffffff82601c20
>>>> mi_startup() at mi_startup+0xec/frame 0xffffffff82601c70
>>>> btext() at btext+0x2c
>>>> KDB: enter: panic
>>>> [ thread pid 0 tid 100000 ]
>>>> Stopped at      kdb_enter+0x37: movq    $0,0x10b5796(%rip9
>>>> db>
>>>>
>>>>
>>>> The system freezes at this point, no core dump is generated ;)  This
>>>> does not happen without loading VBoxDrv.
>>>>
>>>> At least, the screen dump shows VBoxDrvFreeBSDModuleEvent(). I hope,
>>>> this is of some help.
>>>>
>>> Yes it seems to be enough for me to see where the possible issue is.
>>> Try this patch, I did not even compiled it.  Probably you need to put
>>> it into ports/emulators/virtualbox-ose-kmod/files with the name ending
>>> with .patch.
>> This seems to be wrong, name should _start_ with the prefix 'patch-'.
> 
> Many thanks for the patch!
> 
> Putting it into emulators/virtualbox-ose-kmod/files/ as
> patch-src_VBox_Runtime_r0drv_freebsd_memobj-r0drv-freebsd.c does not
> patch the sources, probably because emobj-r0drv-freebsd.c was already
> patched from the main port (virtualbox-ose).
> 
> Patching manually, build and install the kernel module seems to work fine.
> 
> Unfortunaly, after rebooting the same page fault occurs :(
> 
> 
>>>
>>> --- src/VBox/Runtime/r0drv/freebsd/memobj-r0drv-freebsd.c.xxx	2020-09-20 19:40:07.471956776 +0000
>>> +++ src/VBox/Runtime/r0drv/freebsd/memobj-r0drv-freebsd.c	2020-09-20 19:46:03.606966773 +0000
>>> _at__at_ -323,7 +323,8 _at__at_
>>>      size_t      cPages = atop(pMemFreeBSD->Core.cb);
>>>      int         rc;
>>>  
>>> -    pMemFreeBSD->pObject = vm_object_allocate(OBJT_PHYS, cPages);
>>> +    pMemFreeBSD->pObject = vm_pager_allocate(OBJT_PHYS, NULL,
>>> +      pMemFreeBSD->Core.cb, VM_PROT_ALL, 0, curthread->td_ucred);
>>>  
>>>      /* No additional object reference for auto-deallocation upon unmapping. */
>>>  #if __FreeBSD_version >= 1000055
>>> _at__at_ -457,7 +458,8 _at__at_
>>>          return VERR_NO_MEMORY;
>>>      }
>>>  
>>> -    pMemFreeBSD->pObject = vm_object_allocate(OBJT_PHYS, atop(cb));
>>> +    pMemFreeBSD->pObject = vm_pager_allocate(OBJT_PHYS, NULL, cb, VM_PROT_ALL,
>>> +       0, curthread->td_ucred);
>>>  
>>>      if (PhysHighest != NIL_RTHCPHYS)
>>>          VmPhysAddrHigh = PhysHighest;
>>>
I tried a second time with the changes of your patch. This time, I
patched
files/patch-src_VBox_Runtime_r0drv_freebsd_memobj-r0drv-freebsd.c in
emulators/virtualbox-ose (the main port).

After rebuilding virtualbox-ose-kmod and virtualbox-ose and rebooting, I
got some more info (lines at the beginning, numbered #6 to #13; lines #1
to #5 are outside of the frozen screen) and afterwards slightly
different messages:

[..snip..]
#6 0xffffffff8255f756 at rtR0MemObjFreeBSDAllocHelper+0x96
#7 0xffffffff8255f8e0 at rtR0MemObjNativeAllocCont+0x50
#8 0xffffffff8253c7b7 at supdrvGipCreate+0x97
#9 0xffffffff8253519a at supdrvInitDevExt+0x19a
#10 0xffffffff825450f6 at VBoxDrvFreeBSDModuleEvent+0x46
#11 0xffffffff80bb41fd at module_register_init+0xbd
#12 0xffffffff80b6985c at mi_startup+0xec
#13 0xffffffff8037002c at btext+0x2c

Fatal trap 12: page fault while in kernel mode
cpuid = 31; apic id = 1f
fault virtual address   = 0x25407efa
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80ec0b63
stack pointer           = 0x28:0xffffffff826018b0
frame pointer           = 0x28:0xffffffff82601940
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (swapper)
trap number             = 12
panic: page fault
cpuid = 31
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xffffffff82601560
vpanic() at vpanic+0x182/frame 0xffffffff826015b0
panic() at panic+0x43/frame 0xffffffff82601610
trap_fatal() at trap_fatal+0x387/frame 0xffffffff82601670
trap_pfault() at trap_pfault+0x97/frame 0xffffffff826016d0
trap() at trap+0x2ab/frame 0xffffffff826017e0
calltrap() at calltrap+0x8/frame 0xffffffff826017e0
--- trap 0xc, rip = 0xffffffff80ec0b63, rsp = 0xffffffff826018b0, rbp =
0xffffffff82601940 ---
vm_map_insert() at vm_map_insert+0x2f3/framw 0xffffffff82601940
vm_map_find() at vm_map_find+0x4a4/frame 0xffffffff82601a00
rtR0MemObjFreeBSDAllocHelper() at
rtR0MemObjFreeBSDAllocHelper+0x96/frame 0xffffffff82601a70
rtR0MemObjNativeAllocCont() at rtR0MemObjNativeAllocCont+0x50/frame
0xffffffff82601ac0
supdrvGipCreate() at supdrvGipCreate+0x97/frame 0xffffffff82601b60
supdrvInitDevExt() at supdrvInitDevExt+0x19a/frame 0xffffffff82601bd0
VBoxDrvFreeBSDModuleEvent() at VBoxDrvFreeBSDModuleEvent+0x46/frame
0xffffffff82601bf0
module_register_init() at module_register_init+0xbd/frame 0xffffffff82601c20
mi_startup() at mi_startup+0xec/frame 0xffffffff82601c70
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10b5616(%rip)
db>

Perhaps this gives some more insight into the problem? I can't assess,
sorry.