Re: strange kernel crash

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Wed, 11 Nov 2015 10:01:36 +0200
On 10/11/2015 20:42, John Baldwin wrote:
> On Tuesday, November 10, 2015 10:48:08 AM Andriy Gapon wrote:
>> On 09/11/2015 22:16, John Baldwin wrote:
>>> On Friday, November 06, 2015 07:02:59 PM Hans Petter Selasky wrote:
>>>> On 11/06/15 12:20, Andriy Gapon wrote:
>>>>> Now the strange part:
>>>>>
>>>>>     0xffffffff80619a18 <+744>:   jne    0xffffffff80619a61 <__mtx_lock_flags+817>
>>>>>     0xffffffff80619a1a <+746>:   mov    %rbx,(%rsp)
>>>>> => 0xffffffff80619a1e <+750>:   movq   $0x0,0x18(%rsp)
>>>>>     0xffffffff80619a27 <+759>:   movq   $0x0,0x10(%rsp)
>>>>>     0xffffffff80619a30 <+768>:   movq   $0x0,0x8(%rsp)
>>>>
>>>> Were these instructions dumped from RAM or from the kernel ELF file?
>>>
>>> Probably not from RAM.  You can use 'info files' in gdb to see what is
>>> handling the address range in question (core vs executable).  x/i in ddb
>>> would have been the "real" truth.
>>
>> Yes, according to the output of files it looks like gdb would read that data
>> from the text section of the kernel file.
>>
>> How about libkvm?  Would kvm_read read data from the core file?
> 
> kvm_read should only access the vmcore, yes.
> 
>> I've written the following small program (cut down dmesg.c, actually):
>> https://people.freebsd.org/~avg/vmcore_read.c
>>
>> (kgdb) disassemble /r
>> => 0xffffffff80619a1e <+750>:   48 c7 44 24 18 00 00 00 00      movq
>> $0x0,0x18(%rsp)
>>
>> $ vmcore_read -N /boot/kernel.29/kernel -M /var/crash/vmcore.29 0xffffffff80619a1e 9
>> 48 c7 44 24 18 00 00 00 00
>>
>> Seems like the code is intact.
>>
>> P.S.
>> 1. To correct something I said earlier, the fault is #UD, not #GP.
>> 2. The only "suspicious" activity at the time of the crash was the execution of
>> a bhyve VM.
> 
> Was the crash in the guest or the host?  UD# seems even more bizarre.

It was the host.  This is bizarre indeed.  I can think only of two possibilities:
- new CPU erratum
- corrupted data somehow getting into the instruction cache, but the correct
data being read during the crash dump (i.e. flaky memory)

-- 
Andriy Gapon
Received on Wed Nov 11 2015 - 07:03:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC