Re: panic: non-current pmap 0xffffa00020eab8f0 on Rpi3

From: Michal Meloun <meloun.michal_at_gmail.com>
Date: Fri, 23 Oct 2020 18:32:25 +0200
On 19.10.2020 22:39, Mark Johnston wrote:
> On Fri, Oct 16, 2020 at 11:53:56AM +0200, Michal Meloun wrote:
>>
>>
>> On 06.10.2020 15:37, Mark Johnston wrote:
>>> On Mon, Oct 05, 2020 at 07:10:29PM -0700, bob prohaska wrote:
>>>> Still seeing non-current pmap panics on the Pi3, this time a B+ running
>>>> 13.0-CURRENT (GENERIC-MMCCAM) #0 71e02448ffb-c271826(master)
>>>> during a -j4 buildworld.  The backtrace reports
>>>>
>>>> panic: non-current pmap 0xffffa00020eab8f0
>>>
>>> Could you show the output of "show procvm" from the debugger?
>>
>> I see same panic too, in my case its very rare - typical scenario is
>> rebuild of kf5 ports (~250, 2 days of full load).  Any idea how to debug
>> this?
>> Michal
> 
> I suspect that there is some race involving the pmap switching in
> vmspace_exit(), but I can't see it.  In the example below, presumably
> process 22604 on CPU 0 is also exiting?  Could you show the backtrace?>
> It would also be useful to see the value of PCPU_GET(curpmap) at the
> time of the panic.  I'm not sure if there's a way to get that from DDB,
> but I suspect it should be equal to &vmspace0->vm_pmap.
Mark,
I think that I found problem.
The PCPU_GET() is not (and is not supposed to be) an atomic operation,
it expects that thread is at least pinned.
This is not true for pmap_remove_pages() - so I think that the KASSERT
is racy and shoud be removed (or at least covered by
sched_pin()/sched_unpin() pair).
What do you think?

> 
> I think vmspace_exit() should issue a release fence with the cmpset and
> an acquire fence when handling the refcnt == 1 case,
Yep, true, fully agree.
Michal

 but I don't see why
> that would make a difference here.  So, if you can test a debug patch,
> this one will yield a bit more debug info.  If you can provide access to
> a vmcore and kernel debug symbols, that'd be even better.
> 
> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c
> index 284f00b3cc0d..3c53ae3b4c1e 100644
> --- a/sys/arm64/arm64/pmap.c
> +++ b/sys/arm64/arm64/pmap.c
> _at__at_ -4838,7 +4838,8 _at__at_ pmap_remove_pages(pmap_t pmap)
>  	int allfree, field, freed, idx, lvl;
>  	vm_paddr_t pa;
>  
> -	KASSERT(pmap == PCPU_GET(curpmap), ("non-current pmap %p", pmap));
> +	KASSERT(pmap == PCPU_GET(curpmap),
> +	    ("non-current pmap %p %p", pmap, PCPU_GET(curpmap)));
>  
>  	lock = NULL;
>  
> diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c
> index c20005ae64cf..0ad415e3b88c 100644
> --- a/sys/vm/vm_map.c
> +++ b/sys/vm/vm_map.c
> _at__at_ -358,7 +358,10 _at__at_ vmspace_exit(struct thread *td)
>  	p = td->td_proc;
>  	vm = p->p_vmspace;
>  	atomic_add_int(&vmspace0.vm_refcnt, 1);
> -	refcnt = vm->vm_refcnt;
> +	refcnt = atomic_load_int(&vm->vm_refcnt);
> +
> +	KASSERT(vmspace_pmap(vm) == PCPU_GET(curpmap),
> +	    ("non-current pmap %p %p", pmap, PCPU_GET(curpmap)));
>  	do {
>  		if (refcnt > 1 && p->p_vmspace != &vmspace0) {
>  			/* Switch now since other proc might free vmspace */
> 
Received on Fri Oct 23 2020 - 14:32:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC