Hi. Encountering boot failure with fatal trap 18 on boot, happening at (maybe) just before init() starts. Possibly on root remount by kernel or zpool import by rc.d script. The last revision tried is r365316 (r364788 is the last tried clean rebuild). The last health revision is r364744, just before actual switch to OpenZFS. amd64 on ThinkPad P52 (Core i7-8750H) w/descrete nvidia GPU. r364751 with diff of r364777 and r364788 (to successfully built Without unrelated-to-OpenZFS changes) fails. Any suggestions and fixes are appreciated. Trap screen is something like below (text attached), typed up from relatively clear photo, so could be some typo. This is shown just after usual kernel startup outputs. boot1.efi (as EFI/bootx64.efi on ESP) starts /boot/loader.efi properly, and loader.efi seems to boot kernel properly. As even single user shell selection doesn't appear, loader.efi is of r364744. But they works even if I proceeded irregular process, 1)Update src tree 2)Clean obj tree 3)buildworld 4)etcupdate -p 5)buildkernel 6)installkernel 7)shutdown to single user WITHOUT reboot <- Irregular! 8)installworld 9)etcupdate 10)rebuild src/sys-dependent ports (kmods, nvidia-driver, ...) 11)reboot loader.efi looks doing its job and panics after kernel startup ends. Needless to say, rolling back to r364744 state from stable/12 on nvd0 Fixes the issue. Regards. ===== Fatal trap 18: integer divide fault while in kernel mode cpuid = 2; apic id = 02 instruction pointer = 0x20:0xffffffff82bfa320 stack pointer = 0x28:0xfffffe00e20c6900 frame pointer = 0x28:0xfffffe00e20c6960 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 27 (vdev_open) trap number = 18 panic: integer divide fault cpuid = 2 time = 16 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e20c6610 vpanic() at vpanic+0x182/frame fffffe00e20c6660 panic() at panic+0x43/frame fffffe00e20c66c0 trap_fatal() at trap_fatal+0x387/frame fffffe00e20c6720 trap() at trap+0x8e/frame fffffe00e20c6830 calltrap() at calltrap+0x8/frame fffffe00e20c6830 --- trap 0x12, rip = 0xffffffff82bfa320, rsp = 0xfffffe00e20c6900, rbp = 0xfffffe00e20c6960 --- zio_wait() at zio_wait+0x60/frame 0xfffffe00e20c6960 vdev_open() at vdev_open+0x74d/frame 0xfffffe00e20c69c0 vdev_open_child() at vdev_open_child+0x1e/frame 0xfffffe00e20c69e0 taskq_run() at taskq_run+0x1f/frame 0xfffffe00e20c6a00 taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe00e20c6a80 taskqueue_thread_loop() at taskqueue_thread_loop+0x118/frame 0xfffffe00e20c6ab0 fork_exit() at fork_exit+0x7d/frame 0xfffffe00e20c6af0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e20c6af0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 27 tid 100570 ] Stopped at kdb_enter+0x37: movq $0,0x1091556(%rip) db> ===== Additional info: *Clean build with killing CPUTYPE from command line and make.conf (so should be equivalent with nocona) didn't help. *Clean build with commenting out WITH_KERNEL_RETPOLINE line and WITH_RETPOLINE line in src.conf didn't help. *Combination of the above two didn't help, too (at r364788). *There are two root pools in different physical drive. stable/12 on nvd0 (primary) and head on ada0 (secondary). *GENERIC-NODEBUG based (added options CAM_IOSCHED_DYNAMIC) kernel. -- Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC