Dtrace: hotkernel+buildworld -> crash

From: Artem Belevich <fbsdlist_at_src.cx> Date: Tue, 21 Oct 2008 17:58:57 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:36 UTC

enabling WITNESS and INVARIANTS didn't produce anything new.
I've got serial console hooked up, so here's detailed crash info.

It looks like CPU needs to be as busy as possible. The crash seems to
happen only when all four cores are busy. During lighter load it often succeeds.

--Artem

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x20
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff80ad5173
stack pointer           = 0x10:0xffffffff22b7dc40
frame pointer           = 0x10:0xffffffff22b7dc50
code segment            = base 0x0, limit 0xfffff, type 0x1b
                       = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 37749 (cc1)
[thread pid 37749 tid 100194 ]
Stopped at      cyclic_disable_xcall+0x7:       movq    0x20(%rax),%rax

db> show regi
cs                 0x8
ss                   0
rax                  0
rcx                  0
rdx                0x1
rbx         0xffffffff8035dd06  smp_no_rendevous_barrier
rsp         0xffffffff22b7dc40
rbp         0xffffffff22b7dc50
rsi               0x23
rdi         0xffffffff229167d0
r8                 0x6
r9                   0
r10                0x1
r11           0xc8181c
r12         0xffffffff80ad516c  cyclic_disable_xcall
r13         0xffffffff229167d0
r14         0x801305da0
r15                  0
rip         0xffffffff80ad5173  cyclic_disable_xcall+0x7
rflags         0x10086
cyclic_disable_xcall+0x7:       movq    0x20(%rax),%rax

db> trace
Tracing pid 37749 tid 100194 td 0xffffff00ce0d4370
cyclic_disable_xcall() at cyclic_disable_xcall+0x7
smp_rendezvous_action() at smp_rendezvous_action+0xb3
Xrendezvous() at Xrendezvous+0x64
--- interrupt, rip = 0x543faa, rsp = 0x7fffffffe090, rbp = 0x801305ae0 ---

Tracing command dtrace pid 25527 tid 100071 td 0xffffff00053a26e0
cpustop_handler() at cpustop_handler+0x47
ipi_nmi_handler() at ipi_nmi_handler+0x32
trap() at trap+0x26d
nmi_calltrap() at nmi_calltrap+0x8
--- trap 0x13, rip = 0xffffffff80538cb9, rsp = 0xfffffffe40016ff0, rbp
= 0xffffffff229166a0 ---
smp_tlb_shootdown() at smp_tlb_shootdown+0x8a
pmap_invalidate_page() at pmap_invalidate_page+0x79
pmap_remove_pte() at pmap_remove_pte+0xd7
pmap_remove() at pmap_remove+0x2e7
vm_map_delete() at vm_map_delete+0xdc
vm_map_remove() at vm_map_remove+0x4a
uma_large_free() at uma_large_free+0x54
free() at free+0x6b
dtrace_buffer_free() at dtrace_buffer_free+0x1c
dtrace_state_destroy() at dtrace_state_destroy+0x3bb
dtrace_close() at dtrace_close+0x96
devfs_close() at devfs_close+0x16b
vn_close() at vn_close+0x74
vn_closefile() at vn_closefile+0xf1
devfs_close_f() at devfs_close_f+0x1e
_fdrop() at _fdrop+0x20
closef() at closef+0x4a
kern_close() at kern_close+0x13f
syscall() at syscall+0x255
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (6, FreeBSD ELF64, close), rip = 0x800df7a3c, rsp =
0x7fffffffe7b8, rbp = 0x622000 ---

On Tue, Oct 21, 2008 at 2:41 PM, Artem Belevich <fbsdlist_at_src.cx> wrote:
> Hi,
>
> I'm not sure if it's a known issue or not, but running hotkernel
> script from DTraceToolkit-0.99
> during "make buildworld -j8" easily crashes -current  (cvsup'ed on Oct
> 20th) on amd64 (Quad core Q9450)
> when I press ^C to stop the script.
>
> Kernel is GENERIC with WITNESS/INVARIANTS disabled and some
> SCSI/wireless/NIC drivers removed.
>
> I was unable to dump kernel core - debugger always gets another trap
> and returns
> to the prompt. The box does not have serial ports, so I've typed in
> portions of stack
> traces below from the screen.
>
> One common thing across all crashes I've seen so far is that crashed
> process always
> dies in the same place with the following backtrace. Apparently it attempts to
> dereference $rip which is 0.
>
> cyclic_disable_xcall+0x7
> smp_rendezvous_action
> Xrendezvous
> ----interrupt
>
> Dtrace process itself always has the same stack trace:
>
> smp_tlb_shootdown
> pmap_invalidate_page
> pmap_remove_pte
> pmap_remove
> vm_map_delete
> vm_map_remove
> uma_large_free
> free
> dtrace_buffer_free
> dtrace_state_destroy
> dtrace_close
> ...
>
> I'll try to reproduce the issue with WITNESS/INVARIANTS turned ON. Perhaps that
> would provide more hints on what's wrong. Meanwhile, if someone can suggest
> anything I can do to help troubleshoot this, that would be great as
> I'm a bit out of
> my depth here.
>
> --
> --Artem
>

-- 
--Artem