Re: -CURRENT fatal trap cause by cxgbe module

From: Ryan Libby <rlibby_at_freebsd.org> Date: Mon, 2 Mar 2020 16:55:43 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:23 UTC

On Sun, Mar 1, 2020 at 8:07 PM Dustin Marquess <dmarquess_at_gmail.com> wrote:
>
> So I've been fighting with any current from the last month or so
> instantly crashing when I boot it.  I did notice that kernels in the
> various snapshot images were working, however, so I was trying to
> figure out why.  At first I thought it was because I had INVARIANTS
> and such disabled, but no, I finally figured it out.
>
> I've had in my /boot/loader.conf for a while now:
>
> if_cxgbe_load="YES"
>
> I guess since the stock installer kernels don't have cxgbe enabled by
> default.  I added "device cxgbe" to my kernels a while ago.  Normally
> the kernel would give some error about the module already being loaded
> or something and just continue.  As of last month or so, however,
> instead it just crashes:
>
> FreeBSD clang version 9.0.1 (git_at_github.com:llvm/llvm-project.git
> c1a0a213378a458fbea1a5c77b315c7dce08fd05) (based on LLVM 9.0.1)
> WARNING: WITNESS option enabled, expect reduced performance.
> kernel trap 12 with interrupts disabled
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x8
> fault code = supervisor read data, page not present
> instruction pointer = 0x20:0xffffffff80622931
> stack pointer         = 0x28:0xffffffff8241c9a0
> frame pointer         = 0x28:0xffffffff8241c9e0
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = resume, IOPL = 0
> current process = 0 ()
> trap number = 12
> panic: page fault
> cpuid = 0
> time = 1
>
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff8241c600
> vpanic() at vpanic+0x18a/frame 0xffffffff8241c660
> panic() at panic+0x43/frame 0xffffffff8241c6c0
> trap_fatal() at trap_fatal+0x386/frame 0xffffffff8241c720
> trap_pfault() at trap_pfault+0x99/frame 0xffffffff8241c7a0
> trap() at trap+0x4e9/frame 0xffffffff8241c8d0
> calltrap() at calltrap+0x8/frame 0xffffffff8241c8d0
> --- trap 0xc, rip = 0xffffffff80622931, rsp = 0xffffffff8241c9a0, rbp
> = 0xffffffff8241c9e0 ---
> malloc() at malloc+0x51/frame 0xffffffff8241c9e0
> sysctl_handle_string() at sysctl_handle_string+0x12d/frame 0xffffffff8241ca20
> sysctl_root_handler_locked() at sysctl_root_handler_locked+0xa2/frame
> 0xffffffff8241ca70
> sysctl_register_oid() at sysctl_register_oid+0x54c/frame 0xffffffff8241cd80
> sysctl_register_all() at sysctl_register_all+0x88/frame 0xffffffff8241cda0
> mi_startup() at mi_startup+0xf2/frame 0xffffffff8241cdf0
> btext() at btext+0x2c
> KDB: enter: panic
> [ thread pid 0 tid 0 ]
> Stopped at      kdb_enter+0x37: movq    $0,0xa5f4a6(%rip)
> db>
>
> If I take the if_cxgbe_load out, however, it boots fine.
>
> Thanks!
> -Dustin
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"

You maybe also have something defined in your /boot/loader.conf that
causes a tunable to be set?

It looks like there's just an ordering bug in kern_sysctl.c, where we
call sysctl_register_all() with SI_SUB_KMEM, SI_ORDER_FIRST but we do
MALLOC_DEFINE() with SI_SUB_KMEM, SI_ORDER_THIRD.  If
sysctl_register_all() is going to malloc(), it needs to run after
malloc_init(), and it looks like populating a string tunable causes it
to malloc().

Ryan