On 19.10.2020 22:39, Mark Johnston wrote: > On Fri, Oct 16, 2020 at 11:53:56AM +0200, Michal Meloun wrote: >> >> >> On 06.10.2020 15:37, Mark Johnston wrote: >>> On Mon, Oct 05, 2020 at 07:10:29PM -0700, bob prohaska wrote: >>>> Still seeing non-current pmap panics on the Pi3, this time a B+ running >>>> 13.0-CURRENT (GENERIC-MMCCAM) #0 71e02448ffb-c271826(master) >>>> during a -j4 buildworld. The backtrace reports >>>> >>>> panic: non-current pmap 0xffffa00020eab8f0 >>> >>> Could you show the output of "show procvm" from the debugger? >> >> I see same panic too, in my case its very rare - typical scenario is >> rebuild of kf5 ports (~250, 2 days of full load). Any idea how to debug >> this? >> Michal > > I suspect that there is some race involving the pmap switching in > vmspace_exit(), but I can't see it. In the example below, presumably > process 22604 on CPU 0 is also exiting? Could you show the backtrace?> > It would also be useful to see the value of PCPU_GET(curpmap) at the > time of the panic. I'm not sure if there's a way to get that from DDB, > but I suspect it should be equal to &vmspace0->vm_pmap. Mark, I think that I found problem. The PCPU_GET() is not (and is not supposed to be) an atomic operation, it expects that thread is at least pinned. This is not true for pmap_remove_pages() - so I think that the KASSERT is racy and shoud be removed (or at least covered by sched_pin()/sched_unpin() pair). What do you think? > > I think vmspace_exit() should issue a release fence with the cmpset and > an acquire fence when handling the refcnt == 1 case, Yep, true, fully agree. Michal but I don't see why > that would make a difference here. So, if you can test a debug patch, > this one will yield a bit more debug info. If you can provide access to > a vmcore and kernel debug symbols, that'd be even better. > > diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c > index 284f00b3cc0d..3c53ae3b4c1e 100644 > --- a/sys/arm64/arm64/pmap.c > +++ b/sys/arm64/arm64/pmap.c > _at__at_ -4838,7 +4838,8 _at__at_ pmap_remove_pages(pmap_t pmap) > int allfree, field, freed, idx, lvl; > vm_paddr_t pa; > > - KASSERT(pmap == PCPU_GET(curpmap), ("non-current pmap %p", pmap)); > + KASSERT(pmap == PCPU_GET(curpmap), > + ("non-current pmap %p %p", pmap, PCPU_GET(curpmap))); > > lock = NULL; > > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c > index c20005ae64cf..0ad415e3b88c 100644 > --- a/sys/vm/vm_map.c > +++ b/sys/vm/vm_map.c > _at__at_ -358,7 +358,10 _at__at_ vmspace_exit(struct thread *td) > p = td->td_proc; > vm = p->p_vmspace; > atomic_add_int(&vmspace0.vm_refcnt, 1); > - refcnt = vm->vm_refcnt; > + refcnt = atomic_load_int(&vm->vm_refcnt); > + > + KASSERT(vmspace_pmap(vm) == PCPU_GET(curpmap), > + ("non-current pmap %p %p", pmap, PCPU_GET(curpmap))); > do { > if (refcnt > 1 && p->p_vmspace != &vmspace0) { > /* Switch now since other proc might free vmspace */ >Received on Fri Oct 23 2020 - 14:32:29 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC