Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Mon, 4 Jun 2018 14:06:55 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:16 UTC

On Mon, Jun 04, 2018 at 12:46:32AM +0200, Michael Gmelin wrote:
> 
> 
> On Sun, 3 Jun 2018 23:53:40 +0300
> Konstantin Belousov <kostikbel_at_gmail.com> wrote:
> 
> > On Sun, Jun 03, 2018 at 09:50:20PM +0200, Michael Gmelin wrote:
> > > 
> > > 
> > > On Sun, 3 Jun 2018 18:04:23 +0300
> > > Konstantin Belousov <kostikbel_at_gmail.com> wrote:
> > >   
> > > > On Sun, Jun 03, 2018 at 04:55:00PM +0200, Michael Gmelin wrote:  
> > > > > 
> > > > > 
> > > > > On Sun, 3 Jun 2018 16:21:10 +0300
> > > > > Konstantin Belousov <kostikbel_at_gmail.com> wrote:
> > > > >     
> > > > > > On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael Gmelin
> > > > > > wrote:    
> > > > > > > Hi,
> > > > > > > 
> > > > > > > After upgrading CURRENT to r333992 (from something at least
> > > > > > > a year old, quite some changes in mp_machdep.c since), this
> > > > > > > machine crashes on boot:
> > > > > > > 
> > > > > > > Copyright (c) 1992-2018 The FreeBSD Project.
> > > > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
> > > > > > > 1992, 1993, 1994 The Regents of the University of
> > > > > > > California. All rights reserved. FreeBSD is a registered
> > > > > > > trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT
> > > > > > > #1 r333992: Tue May 22 00:31:04 CEST 2018
> > > > > > > root_at_flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy amd64
> > > > > > > FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565)
> > > > > > > (based on LLVM 6.0.0) WARNING: WITNESS option enabled,
> > > > > > > expect reduced performance. VT(vga): resolution 640x480
> > > > > > > CPU: Intel(R) Celeron(R) 2955U _at_ 1.40GHz (1396.80-MHz
> > > > > > > K8-class CPU) Origin="GenuineIntel"  Id=0x40651
> > > > > > > Family=0x6  Model=0x45 Stepping=1
> > > > > > > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,  
> > > > > > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>  
> > > > > > > Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,  
> > > > > > > xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>  
> > > > > > > AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD
> > > > > > > Features2=0x21<LAHF,ABM> Structured Extended
> > > > > > > Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE
> > > > > > > Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS)
> > > > > > > PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant,
> > > > > > > performance statistics real memory  = 4301258752 (4102 MB)
> > > > > > > avail memory = 1907572736 (1819 MB) Event timer "LAPIC"
> > > > > > > quality 600 ACPI APIC Table: <CORE   COREBOOT>      
> > > > > > What does this mean ?  Did you flashed coreboot ?    
> > > > > 
> > > > > This machine comes with it by default (my model was delivered
> > > > > with SeaBIOS 20131018_145217-build121-m2). So I didn't flash
> > > > > anything (didn't feel like bricking it).
> > > > >     
> > > > > >     
> > > > > > > kernel trap 12 with interrupts disabled
> > > > > > > 
> > > > > > > Fatal trap 12: page fault while in kernel mode 
> > > > > > > cpuid = 0; apic id = 00
> > > > > > > fault virtual address    = 0xfffff80001000000
> > > > > > > fault code               = supervisor write data, protection
> > > > > > > violation instruction pointer      = 0x20:Oxffffffff8102955f
> > > > > > > stack pointer            = 0x28:0xffffffff82a79be0
> > > > > > > frame pointer            = 0x28:0xffffffff82a79c10
> > > > > > > code segment             = base Ox0, limit Oxfffff, type
> > > > > > > Ox1b = DPL 0, pres 1, long 1, def32 0, gran
> > > > > > > 1 processor eflags         = resume, IOPL = 0
> > > > > > > current process          = 0 ()
> > > > > > > [ thread pid 0 tid 0 ]
> > > > > > > Stopped at      native_start_all_aps+0x08f:      movq
> > > > > > > %rax,(%rsi)      
> > > > > > Look up the source line number for this address.
> > > > > >     
> > > > > 
> > > > > I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr),
> > > > > called by native_start_all_aps. Any additional hints how I can
> > > > > track it down?    
> > > > Why did you decided that this is rdmsr_safe() ? First,
> > > > native_start_all_aps() does not call rdmsr, second the ddb
> > > > report clearly indicates that the fault occured acessing DMAP in
> > > > native_start_all_aps().
> > > > 
> > > > Just look up the source line by the address
> > > > native_start_all_aps+0x08f.  
> > > 
> > > Okay, according to kgbd this should be here:
> > > 
> > > https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
> > > 
> > > 364
> > > 365    /* Create the initial 1GB replicated page tables */
> > > 366    for (i = 0; i < 512; i++) {
> > > 367            /* Each slot of the level 4 pages points to the same
> > > level 3 page */ 368            pt4[i] =
> > > (u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
> > > pt4[i] |= PG_V | PG_RW | PG_U; 370
> > > 371            /* Each slot of the level 3 pages points to the same
> > > level 2 page */ 372            pt3[i] =
> > > (u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE));
> > > 373            pt3[i] |= PG_V | PG_RW | PG_U; 374
> > > 375            /* The level 2 page slots are mapped with 2MB pages
> > > for 1GB. */ 376            pt2[i] = i * (2 * 1024 * 1024);
> > > 377            pt2[i] |= PG_V | PG_RW | PG_PS | PG_U;
> > > 378    }
> > > 
> > > -m  
> > You have fault on write due to read-only mapping of the portion of
> > the direct map, which maps the kernel text.  It is consistent with
> > the faulting address.  It is not clear if it is something new on
> > your machine, or before the kernel text was silently corrupted, since
> > ro protection is somewhat recent.
> > 
> > It seems that mp_bootaddress() selected the bad place for the
> > bootstrap page tables. Even more, we do not include the kernel text
> > into the physmem[] array, so it is not clear how did it happen. This
> > code was also changed recently.
> > 
> > Can you add the print of the physmap[] array somewhere before the
> > panic, to see what is the kernel idea of the available memory ?  It
> > should be already done if you have serial console and set
> > debug.late_console tunable to 0.
> 
> This is a sad little machine without any kind of serial console.
> 
> Physmap looks like this after calling getmemsize():
> 
> [0]: 0x10000
> [1]: 0x30000
> [2]: 0x40000
> [3]: 0x9e000
> [4]: 0x100000
> [5]: 0xf00000
> [6]: 0x1003000
> [7]: 0x7bf7a000
> 
> Physical memory chunks logged in cpu_startup are:
> 
> 0x0000000000010000 - 0x000000000002ffff, 141072 bytes (32 pages)
> 0x0000000000040000 - 0x000000000009dfff, 385024 bytes (94 pages)
These two chunks reports are consistent with the physmap[0-1, 2-3].

> 0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
> 0x0000000002c00000 - 0x0000000075467fff, 1921417216 bytes (469096 pages)
> 0x0000000100000000 - 0x00000001005e7fff, 6193152 bytes (1512 pages)
But these three looks completely unrelated to the rest of the physmap,
perhaps except the physmap[4].  We allocate boot pages from the top
of the last physmap chunk, but I am certain that we do not consume
that much memory for boot to make physmap[7] from the last reported
address.

Are you sure that there are no typos  in the values above ?

> 
> -m
> 
> > 
> > > 
> > > p.s. This machine uses quirks in biosmem.c, see
> > > 
> > > Type '?' for a list of command, 'help' for more detailed
> > > help.
> > > OK biosmem
> > > bios_basemem: 0x9e400
> > > bios_extmem: 0x3ff00000
> > > memtop: 0x3c000000
> > > high_heap_base: 0x3c000000
> > > high_heap_size: 0x4000000
> > > bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM
> > > b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801
> > > 
> > > -- 
> > > Michael Gmelin
> > > 
> > > -- 
> > > Michael Gmelin  
> 
> 
> 
> -- 
> Michael Gmelin