Re: strange kernel crash

From: Don Lewis <truckman_at_FreeBSD.org> Date: Fri, 6 Nov 2015 09:58:01 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC

On  6 Nov, Konstantin Belousov wrote:
> On Fri, Nov 06, 2015 at 01:20:13PM +0200, Andriy Gapon wrote:
>> Unread portion of the kernel message buffer:
>> 
>> Fatal trap 1: privileged instruction fault while in kernel mode
>> cpuid = 0; apic id = 00
>> instruction pointer     = 0x20:0xffffffff80619a1e
>> stack pointer           = 0x28:0xfffffe04f57856f0
>> frame pointer           = 0x28:0xfffffe04f57857b0
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                         = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags        = interrupt enabled, resume, IOPL = 0
>> current process         = 2658 (firefox)
>> trap number             = 1
>> panic: privileged instruction fault
>> cpuid = 0
>> curthread: 0xfffff803270b6000
>> stack: 0xfffffe04f5782000 - 0xfffffe04f5786000
>> stack pointer: 0xfffffe04f5785320
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at 0xffffffff8041e86b = db_trace_self_wrapper+0x2b/frame
>> 0xfffffe04f5785250
>> kdb_backtrace() at 0xffffffff80669f39 = kdb_backtrace+0x39/frame 0xfffffe04f5785300
>> vpanic() at 0xffffffff8063531c = vpanic+0x14c/frame 0xfffffe04f5785340
>> panic() at 0xffffffff80635063 = panic+0x43/frame 0xfffffe04f57853a0
>> trap_fatal() at 0xffffffff8081fc0f = trap_fatal+0x33f/frame 0xfffffe04f5785400
>> trap() at 0xffffffff8081f872 = trap+0x7d2/frame 0xfffffe04f5785610
>> trap_check() at 0xffffffff8081ff2a = trap_check+0x2a/frame 0xfffffe04f5785630
>> calltrap() at 0xffffffff80807ea0 = calltrap+0x8/frame 0xfffffe04f5785630
>> --- trap 0x1, rip = 0xffffffff80619a1e, rsp = 0xfffffe04f5785700, rbp =
>> 0xfffffe04f57857b0 ---
>> __mtx_lock_flags() at 0xffffffff80619a1e = __mtx_lock_flags+0x2ee/frame
>> 0xfffffe04f57857b0
>> uma_dbg_getslab() at 0xffffffff807df15c = uma_dbg_getslab+0x3c/frame
>> 0xfffffe04f57857d0
>> uma_dbg_alloc() at 0xffffffff807df08d = uma_dbg_alloc+0x2d/frame 0xfffffe04f5785800
>> uma_zalloc_arg() at 0xffffffff807dacf1 = uma_zalloc_arg+0x4b1/frame
>> 0xfffffe04f5785890
>> uma_zalloc() at 0xffffffff8068b040 = uma_zalloc+0x10/frame 0xfffffe04f57858a0
>> selfdalloc() at 0xffffffff8068aa12 = selfdalloc+0x22/frame 0xfffffe04f57858c0
>> pollscan() at 0xffffffff8068a615 = pollscan+0x95/frame 0xfffffe04f5785910
>> kern_poll() at 0xffffffff8068a4b1 = kern_poll+0x1f1/frame 0xfffffe04f5785a70
>> sys_poll() at 0xffffffff8068a2b9 = sys_poll+0x79/frame 0xfffffe04f5785a90
>> syscallenter() at 0xffffffff80820560 = syscallenter+0x320/frame 0xfffffe04f5785b00
>> amd64_syscall() at 0xffffffff8082012f = amd64_syscall+0x1f/frame 0xfffffe04f5785bf0
>> Xfast_syscall() at 0xffffffff8080818b = Xfast_syscall+0xfb/frame 0xfffffe04f5785bf0
>> --- syscall (209, FreeBSD ELF64, sys_poll), rip = 0x80146342a, rsp =
>> 0x7fffffffd8e8, rbp = 0x7fffffffd920 ---
>> Uptime: 1d12h57m32s
>> 
>> 
>> Now the strange part:
>> 
>>    0xffffffff80619a18 <+744>:   jne    0xffffffff80619a61 <__mtx_lock_flags+817>
>>    0xffffffff80619a1a <+746>:   mov    %rbx,(%rsp)
>> => 0xffffffff80619a1e <+750>:   movq   $0x0,0x18(%rsp)
>>    0xffffffff80619a27 <+759>:   movq   $0x0,0x10(%rsp)
>>    0xffffffff80619a30 <+768>:   movq   $0x0,0x8(%rsp)
>> 
>> RSP value seems to be sane and consistent with the stack information above:
>> (kgdb) i reg
>> rax            0x4      4
>> rbx            0xfffff80126ea54f0       -8791145163536
>> rcx            0xffffffff8099a600       -2137414144
>> rdx            0xfffff803270b6000       -8782553063424
>> rsi            0x4      4
>> rdi            0xfffff80027f41318       -8795422715112
>> rbp            0xfffffe04f57857b0       0xfffffe04f57857b0
>> rsp            0xfffffe04f5785700       0xfffffe04f5785700
>> r8             0xffffffff809a7727       -2137360601
>> r9             0xfffff80126ea54f0       -8791145163536
>> r10            0x3e8    1000
>> r11            0xfffffe04f5785cc0       -2177725080384
>> r12            0x1      1
>> r13            0xfffff803270b6000       -8782553063424
>> r14            0xfffff80027f41318       -8795422715112
>> r15            0x0      0
>> rip            0xffffffff80619a1e       0xffffffff80619a1e <__mtx_lock_flags+750>
>> eflags         0x10246  [ PF ZF IF RF ]
>> cs             0x20     32
>> ss             0x28     40
>> ds             <unavailable>
>> es             <unavailable>
>> fs             <unavailable>
>> gs             <unavailable>
>> 
>> (kgdb) x/a $rsp
>> 0xfffffe04f5785700:     0xfffff80126ea54f0
>> (kgdb) x/a $rsp + 0x18
>> 0xfffffe04f5785718:     0x0
>> 
>> I have no idea what could have caused the #GP.  This is certainly not a stack
>> overflow.
> 
> This is a second report, please take a look at
> https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057975.html
> I have no idea as well.

Whatever the problem is, it appears to be hard to trigger.  I haven't
had a recurrence.