Re: Fatal trap 18 on boot after OpenZFS import

From: Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
Date: Sun, 20 Sep 2020 22:06:52 +0900
Forgot to mention here.

As I already mentioned on bugzilla, this problem is fixed at r365894.

Thanks again, Ryan and Matthew!


On Sun, 6 Sep 2020 18:02:40 +0900
Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp> wrote:

> Filed PR.
> Bug 249147 - [ZFS][Panic]Fatal trap 18 on boot after OpenZFS import
> 
>  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249147
> 
> 
> On Fri, 4 Sep 2020 22:03:01 +0900
> Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp> wrote:
> 
> > Hi.
> > 
> > Encountering boot failure with fatal trap 18 on boot,
> > happening at (maybe) just before init() starts. Possibly on
> > root remount by kernel or zpool import by rc.d script.
> > The last revision tried is r365316 (r364788 is the last tried
> > clean rebuild).
> > 
> > The last health revision is r364744, just before actual switch
> > to OpenZFS. amd64 on ThinkPad P52 (Core i7-8750H) w/descrete nvidia GPU.
> > 
> > r364751 with diff of r364777 and r364788 (to successfully built
> > Without unrelated-to-OpenZFS changes) fails.
> > 
> > Any suggestions and fixes are appreciated.
> > 
> > 
> > Trap screen is something like below (text attached),
> > typed up from relatively clear photo, so could be some typo.
> > 
> > This is shown just after usual kernel startup outputs.
> > boot1.efi (as EFI/bootx64.efi on ESP) starts /boot/loader.efi
> > properly, and loader.efi seems to boot kernel properly.
> > 
> > As even single user shell selection doesn't appear, loader.efi
> > is of r364744. But they works even if I proceeded irregular
> > process,
> > 
> >   1)Update src tree
> >   2)Clean obj tree
> >   3)buildworld
> >   4)etcupdate -p
> >   5)buildkernel
> >   6)installkernel
> >   7)shutdown to single user WITHOUT reboot  <- Irregular!
> >   8)installworld
> >   9)etcupdate
> >  10)rebuild src/sys-dependent ports (kmods, nvidia-driver, ...)
> >  11)reboot
> > 
> > loader.efi looks doing its job and panics after kernel startup ends.
> > Needless to say, rolling back to r364744 state from stable/12 on nvd0
> > Fixes the issue.
> > 
> > Regards.
> > 
> > =====
> > 
> > Fatal trap 18: integer divide fault while in kernel mode
> > cpuid = 2; apic id = 02
> > instruction pointer     = 0x20:0xffffffff82bfa320
> > stack pointer           = 0x28:0xfffffe00e20c6900
> > frame pointer           = 0x28:0xfffffe00e20c6960
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 27 (vdev_open)
> > trap number             = 18
> > panic: integer divide fault
> > cpuid = 2
> > time = 16
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfffffe00e20c6610 vpanic() at vpanic+0x182/frame fffffe00e20c6660
> > panic() at panic+0x43/frame fffffe00e20c66c0
> > trap_fatal() at trap_fatal+0x387/frame fffffe00e20c6720
> > trap() at trap+0x8e/frame fffffe00e20c6830
> > calltrap() at calltrap+0x8/frame fffffe00e20c6830
> > --- trap 0x12, rip = 0xffffffff82bfa320, rsp = 0xfffffe00e20c6900, rbp
> > = 0xfffffe00e20c6960 --- zio_wait() at zio_wait+0x60/frame
> > 0xfffffe00e20c6960 vdev_open() at vdev_open+0x74d/frame
> > 0xfffffe00e20c69c0 vdev_open_child() at vdev_open_child+0x1e/frame
> > 0xfffffe00e20c69e0 taskq_run() at taskq_run+0x1f/frame
> > 0xfffffe00e20c6a00 taskqueue_run_locked() at
> > taskqueue_run_locked+0x181/frame 0xfffffe00e20c6a80
> > taskqueue_thread_loop() at taskqueue_thread_loop+0x118/frame
> > 0xfffffe00e20c6ab0 fork_exit() at fork_exit+0x7d/frame
> > 0xfffffe00e20c6af0 fork_trampoline() at fork_trampoline+0xe/frame
> > 0xfffffe00e20c6af0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> > [ thread pid 27 tid 100570 ]
> > Stopped at      kdb_enter+0x37: movq    $0,0x1091556(%rip)
> > db> 
> > 
> > =====
> > 
> > Additional info:
> >  *Clean build with killing CPUTYPE from command line and
> >   make.conf (so should be equivalent with nocona) didn't help.
> > 
> >  *Clean build with commenting out WITH_KERNEL_RETPOLINE line
> >   and WITH_RETPOLINE line in src.conf didn't help.
> > 
> >  *Combination of the above two didn't help, too (at r364788).
> > 
> >  *There are two root pools in different physical drive.
> >   stable/12 on nvd0 (primary) and head on ada0 (secondary).
> > 
> >  *GENERIC-NODEBUG based (added options CAM_IOSCHED_DYNAMIC)
> >   kernel.
> > 
> > -- 
> > Tomoaki AOKI    <junchoon_at_dec.sakura.ne.jp>
> 
> 
> -- 
> Tomoaki AOKI    <junchoon_at_dec.sakura.ne.jp>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"


-- 
Tomoaki AOKI    <junchoon_at_dec.sakura.ne.jp>
Received on Sun Sep 20 2020 - 11:06:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC