Re: panic: non-current pmap 0xffffa00020eab8f0 on Rpi3

From: Mark Johnston <markj_at_freebsd.org>
Date: Sat, 24 Oct 2020 15:37:35 -0400
On Fri, Oct 23, 2020 at 06:32:25PM +0200, Michal Meloun wrote:
> 
> 
> On 19.10.2020 22:39, Mark Johnston wrote:
> > On Fri, Oct 16, 2020 at 11:53:56AM +0200, Michal Meloun wrote:
> >>
> >>
> >> On 06.10.2020 15:37, Mark Johnston wrote:
> >>> On Mon, Oct 05, 2020 at 07:10:29PM -0700, bob prohaska wrote:
> >>>> Still seeing non-current pmap panics on the Pi3, this time a B+ running
> >>>> 13.0-CURRENT (GENERIC-MMCCAM) #0 71e02448ffb-c271826(master)
> >>>> during a -j4 buildworld.  The backtrace reports
> >>>>
> >>>> panic: non-current pmap 0xffffa00020eab8f0
> >>>
> >>> Could you show the output of "show procvm" from the debugger?
> >>
> >> I see same panic too, in my case its very rare - typical scenario is
> >> rebuild of kf5 ports (~250, 2 days of full load).  Any idea how to debug
> >> this?
> >> Michal
> > 
> > I suspect that there is some race involving the pmap switching in
> > vmspace_exit(), but I can't see it.  In the example below, presumably
> > process 22604 on CPU 0 is also exiting?  Could you show the backtrace?>
> > It would also be useful to see the value of PCPU_GET(curpmap) at the
> > time of the panic.  I'm not sure if there's a way to get that from DDB,
> > but I suspect it should be equal to &vmspace0->vm_pmap.
> Mark,
> I think that I found problem.
> The PCPU_GET() is not (and is not supposed to be) an atomic operation,
> it expects that thread is at least pinned.
> This is not true for pmap_remove_pages() - so I think that the KASSERT
> is racy and shoud be removed (or at least covered by
> sched_pin()/sched_unpin() pair).
> What do you think?

I think you're right.  On amd64 curpmap is loaded using a single
instruction so the assertion happens to work properly.  On arm64 we
have:

   0xffff0000007ff138 <+32>:      mov     x8, x18
   0xffff0000007ff13c <+36>:      ldr     x8, [x8, #216]
   0xffff0000007ff140 <+40>:      mov     x26, x0
   0xffff0000007ff144 <+44>:      cmp     x8, x0

Though, it looks like arm64's PCPU_GET could be modified to combine the
first two instructions.

To fix it, we could perhaps change the KASSERT to verify that pmap ==
vmspace_pmap(curthread->td_proc->p_vmspace).  The various
implementations of pmap_remove_pages() have different flavours of the
same check and it would be nice to unify them.  Using sched_pin() would
also be fine I think.

> > I think vmspace_exit() should issue a release fence with the cmpset and
> > an acquire fence when handling the refcnt == 1 case,
> Yep, true, fully agree.

Alan pointed out in the review that pmap_remove_pages() acquires the
pmap lock, which I missed, so I don't think the extra barriers are
necessary after all.
Received on Sat Oct 24 2020 - 17:37:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC