Re: 11.0-CURRENT r290039 privileged instruction fault while in kernel mode

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Wed, 28 Oct 2015 14:12:33 -0700 (PDT)
On 28 Oct, Konstantin Belousov wrote:
> On Wed, Oct 28, 2015 at 10:16:12AM -0700, Don Lewis wrote:
>> On 28 Oct, Konstantin Belousov wrote:
>> > On Tue, Oct 27, 2015 at 04:09:28PM -0700, Don Lewis wrote:
>> >> I just got this crash while running poudriere on a freshly upgraded
>> >> 11.0-CURRENT machine.  The instruction pointer value looks pretty
>> >> strange.
>> >> 
>> >> 
>> >> FreeBSD zipper.catspoiler.org 11.0-CURRENT FreeBSD 11.0-CURRENT #30 r290039: Tue Oct 27 00:08:00 PDT 2015     dl_at_zipper.catspoiler.org:/usr/obj/usr/src/sys/GENERIC  amd64
>> >> 
>> >> panic: 
>> >> 
>> >> GNU gdb 6.1.1 [FreeBSD]
>> >> Copyright 2004 Free Software Foundation, Inc.
>> >> GDB is free software, covered by the GNU General Public License, and you are
>> >> welcome to change it and/or distribute copies of it under certain conditions.
>> >> Type "show copying" to see the conditions.
>> >> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> >> This GDB was configured as "amd64-marcel-freebsd"...
>> >> 
>> >> Unread portion of the kernel message buffer:
>> >> 
>> >> 
>> >> Fatal trap 1: privileged instruction fault while in kernel mode
>> >> cpuid = 4; apic id = 14
>> >> instruction pointer	= 0x20:0xffffffff8240fef5
>> > What is the instruction at the reported address ?
>> 
>> (kgdb) disassemble/r
>> Dump of assembler code for function cpu_lock:
>>    0xffffffff8240fef0 <+0>:	25 bb 40 82 ff	and    $0xff8240bb,%eax
>> => 0xffffffff8240fef5 <+5>:	ff	(bad)  
>>    0xffffffff8240fef6 <+6>:	ff	(bad)  
>>    0xffffffff8240fef7 <+7>:	ff 00	incl   (%rax)
>>    0xffffffff8240fef9 <+9>:	00 71 02	add    %dh,0x2(%rcx)
>>    0xffffffff8240fefc <+12>:	00 00	add    %al,(%rax)
>>    0xffffffff8240fefe <+14>:	00 00	add    %al,(%rax)
>>    0xffffffff8240ff00 <+16>:	00 00	add    %al,(%rax)
>>    0xffffffff8240ff02 <+18>:	00 00	add    %al,(%rax)
>>    0xffffffff8240ff04 <+20>:	00 00	add    %al,(%rax)
>>    0xffffffff8240ff06 <+22>:	00 00	add    %al,(%rax)
>>    0xffffffff8240ff08 <+24>:	01 00	add    %eax,(%rax)
>>    0xffffffff8240ff0a <+26>:	00 00	add    %al,(%rax)
>>    0xffffffff8240ff0c <+28>:	00 00	add    %al,(%rax)
>>    0xffffffff8240ff0e <+30>:	00 00	add    %al,(%rax)
>> End of assembler dump.
> 
> Oh, I see. cpu_lock is mutex, dump above demonstrates is cleanly.
> Most likely, something overwrote some pointer to a function with
> the address.
> 
> You probably have to bisect.

The could be difficult.  Whatever this is, it seems to be very hard to
trigger.  The machine was up and doing a lot of poudriere package
building for about a day before it crashed.  It's now got close to a day
of uptime again, mostly building packages, without another crash.  The
previous kernel was r289123.
Received on Wed Oct 28 2015 - 20:12:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC