Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy)

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sun, 19 Aug 2018 19:16:42 +0300
On Sun, Aug 19, 2018 at 04:59:51PM +0200, Michael Gmelin wrote:
> 
> 
> On Fri, 17 Aug 2018 10:02:08 +0100
> John Baldwin <jhb_at_FreeBSD.org> wrote:
> 
> > On 8/17/18 9:54 AM, Michael Gmelin wrote:
> > > 
> > >   
> > >> On 17. Aug 2018, at 08:17, John Baldwin <jhb_at_FreeBSD.org> wrote:
> > >>  
> > >>> On 8/16/18 1:58 PM, Michael Gmelin wrote:
> > >>>
> > >>>  
> > >>>> On 15. Aug 2018, at 15:55, Konstantin Belousov
> > >>>> <kostikbel_at_gmail.com <mailto:kostikbel_at_gmail.com>> wrote: 
> > >>>>> On Wed, Aug 15, 2018 at 03:52:37PM +0200, Michael Gmelin wrote:
> > >>>>>
> > >>>>>  
> > >>>>>>> On 15. Aug 2018, at 15:04, Konstantin Belousov
> > >>>>>>> <kostikbel_at_gmail.com <mailto:kostikbel_at_gmail.com>> wrote:
> > >>>>>>>
> > >>>>>>> On Wed, Aug 15, 2018 at 12:51:06AM +0200, Michael Gmelin
> > >>>>>>> wrote: Reviving this old thread, since I just updated to
> > >>>>>>> r337818 and a similar problem is happening again. Since the
> > >>>>>>> fix in r334799 (review https://reviews.freebsd.org/D15675)
> > >>>>>>> (mp_)machdep.c have been touched, so maybe this is related
> > >>>>>>> (https://svnweb.freebsd.org/base?view=revision&revision=334799).
> > >>>>>>>
> > >>>>>>> Please see the screenshot of the panic below:
> > >>>>>>> https://gist.github.com/grembo/78d0f2a100dd4f16775b85a118769658
> > >>>>>>>
> > >>>>>>> This is me not digging any deeper, hoping that this is
> > >>>>>>> something obvious. Please let me know if you need more
> > >>>>>>> input.  
> > >>>>>>
> > >>>>>> I do not see how recent mp_machdep.c changes could affect this.
> > >>>>>> Can you try newest kernel but old loader ?  
> > >>>>>
> > >>>>> I will try (but that will take a while). Oh, also, it still
> > >>>>> boots in save mode/with smp disabled.  
> > >>>>
> > >>>> Right, this is because the access to that address through DMAP
> > >>>> is only needed when configuring AP startup resources.
> > >>>>
> > >>>> Also, I think it is safe to suggest that the bisect is needed.  
> > >>>
> > >>> Using an older loader didn???t help, but I identified the problem:
> > >>>
> > >>> https://svnweb.freebsd.org/base?view=revision&revision=334952
> > >>>
> > >>> modified the code you introduced in
> > >>>
> > >>> https://svnweb.freebsd.org/base?view=revision&revision=334799
> > >>>
> > >>> By correcting units to pages it also broke booting the Chromebook
> > >>> as a side effect - so the previous fix just worked due to a bug
> > >>> it seems.
> > >>>
> > >>> Is there an easy way to output the content of physmap at that
> > >>> point (debug.late_console=0 doesn???t work) - like an existing
> > >>> buffer I could use, or would this be more elaborate (I did
> > >>> something complicated last time but didn???t save it, so any simple
> > >>> solution would be preferred).  
> > >>
> > >> How about reverting the commit for now so you get a working console
> > >> and print out the physmap array values along with Maxmem later in
> > >> the boot (or just use kgdb to examine them once the system is
> > >> running)? 
> > > 
> > > This is before the system has a working console (part of calling
> > > getmem...), disabling late console makes it hang, physmap changes
> > > afterwards, so running kgdb later doesn???t help. Last time I kept a
> > > copy of physmap and logged it later to know the original content. I
> > > can do that again, I just thought maybe there is a simple mechanism
> > > I???m not aware of that would save me some time.  
> > 
> > I thought we only modified phys_avail[], but saving a copy of
> > physmap[] and dumping it from kgdb is probably the simplest thing to
> > do.
> > 
> 
> Okay, so I had some time to investigate a bit more:
> 
> Before calling init_ops.mp_bootaddress in getmemsize (machdep.c),
> physmap looks like this:
> 
> physmap_idx: 8
> i mem atop
> 0 0x0 0x0
> 1 0x30000 0x30
> 2 0x40000 0x40
> 3 0x9e400 0x9e
> 4 0x100000 0x100
> 5 0xf00000 0xf00
> 6 0x1000000 0x1000
> 7 0x7bf7a000 0x7bf7a
> 8 0x100000000 0x100000
> 9 0x100600000 0x100600
> 10 0x0 0x0
> Maxmem: 0x100600000 0x100600
> 
> Without using atop (the "buggy" version that actually boots without
> crashing), the loop in mp_bootaddress looks like this:
> 
> i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem
> 8 0x100000000 0x100600000 0x100600 0x100600 
> 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 
> 4 0x100000 0xf00000 0xf00 0x100600 
> 2 0x40000 0x9e400 0x9e 0x100600 
> 
> And physmap looks like this afterwards:
> 
> physmap_idx: 8
> i mem atop
> 0 0x0 0x0
> 1 0x30000 0x30
> 2 0x43000 0x43 <-- here
> 3 0x9e400 0x9e
> 4 0x100000 0x100
> 5 0xf00000 0xf00
> 6 0x1000000 0x1000
> 7 0x7bf7a000 0x7bf7a
> 8 0x100000000 0x100000
> 9 0x100600000 0x100600
> 10 0x0 0x0
> mptramp_pagetables is 0x40000
> 
> So a three page gap was made at 0x40000 (atop(idx 2) is now 0x43
> instead of 0x40)
> 
> In the current version (using atop), the loop in mp_bootaddress
> looks like this:
> 
> i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem
> 8 0x100000000 0x100600000 0x100600 0x100600 
> 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 
> 
> And physmap looks like this afterwards:
> 
> physmap_idx: 8
> i mem atop
> 0 0x0 0x0
> 1 0x30000 0x30
> 2 0x40000 0x40
> 3 0x9e400 0x9e
> 4 0x100000 0x100
> 5 0xf00000 0xf00
> 6 0x1003000 0x1003 <-- here
> 7 0x7bf7a000 0x7bf7a
> 8 0x100000000 0x100000
> 9 0x100600000 0x100600
> 10 0x0 0x0
> mptramp_pagetables: 0x1000000
> 
> So a three page gap was made at 0x1000000 (atop(idx 6) is now
> 0x1003 instead of 0x1000)
> 
> When changing the code to require a page below 0x1000:
> 
>   if (physmap[i] >= GiB(4) || physmap[i + 1] -
>       round_page(physmap[i]) < PAGE_SIZE * 3 ||
>       atop(physmap[i + 1]) > Maxmem
>       || atop(physmap[i + 1]) > 0x1000) // <--- this
>       continue;
> 
> The system boots just fine. It uses page 0x100
> for the bootstrap code in this case:
> 
> i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem
> 8 0x100000000 0x100600000 0x100600 0x100600 
> 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 
> 4 0x100000 0xf00000 0xf00 0x100600 
> 
> Physmap looks like this:
> physmap_idx: 8
> i mem atop
> 0 0x0 0x0
> 1 0x30000 0x30
> 2 0x40000 0x40
> 3 0x9e400 0x9e
> 4 0x103000 0x103 <-- here
> 5 0xf00000 0xf00
> 6 0x1000000 0x1000
> 7 0x7bf7a000 0x7bf7a
> 8 0x100000000 0x100000
> 9 0x100600000 0x100600
> 10 0x0 0x0
> mptramp_pagetables: 0x100000
> 
> So for some reason it's crashing when using pages 0x1000 - 0x1003 for
> the bootstrap code, while it boots okay when using 0x40 - 0x43 and
> 0x100 - 0x103.
> 
> Any ideas?
I in fact misread the page fault state decoding in your photo.
It is curiously protection violation on write, instead of non-present
page access.

Compile ddb into your kernel, then on fault do
db> x/x dmaplimit
db> x/x dmaplimit+4
db> show pte <fault virtual address>

Also show me the verbose dmesg lines with CPU features identification.

> 
> Best,
> Michael
> 
> p.s. This is what biosmem looks like
> 
> Type '?' for a list of command, 'help' for more detailed
> help.
> OK biosmem
> bios_basemem: 0x9e400
> bios_extmem: 0x3ff00000
> memtop: 0x3c000000
> high_heap_base: 0x3c000000
> high_heap_size: 0x4000000
> bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM
> b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801
> 
> -- 
> Michael Gmelin
Received on Sun Aug 19 2018 - 14:16:59 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:17 UTC