Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Sun, 3 Jun 2018 23:53:40 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:16 UTC

On Sun, Jun 03, 2018 at 09:50:20PM +0200, Michael Gmelin wrote:
> 
> 
> On Sun, 3 Jun 2018 18:04:23 +0300
> Konstantin Belousov <kostikbel_at_gmail.com> wrote:
> 
> > On Sun, Jun 03, 2018 at 04:55:00PM +0200, Michael Gmelin wrote:
> > > 
> > > 
> > > On Sun, 3 Jun 2018 16:21:10 +0300
> > > Konstantin Belousov <kostikbel_at_gmail.com> wrote:
> > >   
> > > > On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael Gmelin wrote:  
> > > > > Hi,
> > > > > 
> > > > > After upgrading CURRENT to r333992 (from something at least a
> > > > > year old, quite some changes in mp_machdep.c since), this
> > > > > machine crashes on boot:
> > > > > 
> > > > > Copyright (c) 1992-2018 The FreeBSD Project.
> > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992,
> > > > > 1993, 1994 The Regents of the University of California. All
> > > > > rights reserved. FreeBSD is a registered trademark of The
> > > > > FreeBSD Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue May 22
> > > > > 00:31:04 CEST 2018
> > > > > root_at_flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy amd64
> > > > > FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565)
> > > > > (based on LLVM 6.0.0) WARNING: WITNESS option enabled, expect
> > > > > reduced performance. VT(vga): resolution 640x480 CPU: Intel(R)
> > > > > Celeron(R) 2955U _at_ 1.40GHz (1396.80-MHz K8-class CPU)
> > > > > Origin="GenuineIntel"  Id=0x40651  Family=0x6  Model=0x45
> > > > > Stepping=1
> > > > > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
> > > > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> > > > > Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
> > > > > xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
> > > > > AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD
> > > > > Features2=0x21<LAHF,ABM> Structured Extended
> > > > > Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE
> > > > > Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
> > > > > PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
> > > > > performance statistics real memory  = 4301258752 (4102 MB)
> > > > > avail memory = 1907572736 (1819 MB) Event timer "LAPIC" quality
> > > > > 600 ACPI APIC Table: <CORE   COREBOOT>    
> > > > What does this mean ?  Did you flashed coreboot ?  
> > > 
> > > This machine comes with it by default (my model was delivered with 
> > > SeaBIOS 20131018_145217-build121-m2). So I didn't flash anything
> > > (didn't feel like bricking it).
> > >   
> > > >   
> > > > > kernel trap 12 with interrupts disabled
> > > > > 
> > > > > Fatal trap 12: page fault while in kernel mode 
> > > > > cpuid = 0; apic id = 00
> > > > > fault virtual address    = 0xfffff80001000000
> > > > > fault code               = supervisor write data, protection
> > > > > violation instruction pointer      = 0x20:Oxffffffff8102955f
> > > > > stack pointer            = 0x28:0xffffffff82a79be0
> > > > > frame pointer            = 0x28:0xffffffff82a79c10
> > > > > code segment             = base Ox0, limit Oxfffff, type Ox1b
> > > > >                          = DPL 0, pres 1, long 1, def32 0, gran
> > > > > 1 processor eflags         = resume, IOPL = 0
> > > > > current process          = 0 ()
> > > > > [ thread pid 0 tid 0 ]
> > > > > Stopped at      native_start_all_aps+0x08f:      movq
> > > > > %rax,(%rsi)    
> > > > Look up the source line number for this address.
> > > >   
> > > 
> > > I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr),
> > > called by native_start_all_aps. Any additional hints how I can
> > > track it down?  
> > Why did you decided that this is rdmsr_safe() ? First,
> > native_start_all_aps() does not call rdmsr, second the ddb
> > report clearly indicates that the fault occured acessing DMAP in
> > native_start_all_aps().
> > 
> > Just look up the source line by the address
> > native_start_all_aps+0x08f.
> 
> Okay, according to kgbd this should be here:
> 
> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
> 
> 364
> 365    /* Create the initial 1GB replicated page tables */
> 366    for (i = 0; i < 512; i++) {
> 367            /* Each slot of the level 4 pages points to the same
> level 3 page */ 368            pt4[i] =
> (u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
> pt4[i] |= PG_V | PG_RW | PG_U; 370
> 371            /* Each slot of the level 3 pages points to the same
> level 2 page */ 372            pt3[i] =
> (u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE));
> 373            pt3[i] |= PG_V | PG_RW | PG_U; 374
> 375            /* The level 2 page slots are mapped with 2MB pages for
> 1GB. */ 376            pt2[i] = i * (2 * 1024 * 1024);
> 377            pt2[i] |= PG_V | PG_RW | PG_PS | PG_U;
> 378    }
> 
> -m
You have fault on write due to read-only mapping of the portion of
the direct map, which maps the kernel text.  It is consistent with
the faulting address.  It is not clear if it is something new on
your machine, or before the kernel text was silently corrupted, since
ro protection is somewhat recent.

It seems that mp_bootaddress() selected the bad place for the bootstrap
page tables. Even more, we do not include the kernel text into the
physmem[] array, so it is not clear how did it happen. This code was
also changed recently.

Can you add the print of the physmap[] array somewhere before the panic,
to see what is the kernel idea of the available memory ?  It should
be already done if you have serial console and set debug.late_console
tunable to 0.

> 
> p.s. This machine uses quirks in biosmem.c, see
> 
> Type '?' for a list of command, 'help' for more detailed
> help.
> OK biosmem
> bios_basemem: 0x9e400
> bios_extmem: 0x3ff00000
> memtop: 0x3c000000
> high_heap_base: 0x3c000000
> high_heap_size: 0x4000000
> bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM
> b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801
> 
> -- 
> Michael Gmelin
> 
> -- 
> Michael Gmelin