Re: Fix for sys_munlock(2) with racct

From: Jeremie Le Hen <jlh_at_FreeBSD.org> Date: Sun, 21 Jul 2013 23:50:38 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:39 UTC

On Sat, Jul 20, 2013 at 08:33:35PM -0700, Alan Cox wrote:
> 
> On Jul 20, 2013, at 4:22 AM, Jeremie Le Hen wrote:
> 
> > Hi Edward, Alan,
> > 
> > I plan to commit the following patch:
> > http://people.freebsd.org/~jlh/racct_munlock.diff
> > 
> > This solves the following panic:
> > 
> > panic: racct_sub: freeing 301989888 of resource 5, which is more than allocated 73728 for pwsafe (pid 4177)
> > 
> > The problem is that the racct code in sys_munlock() trusts too much the
> > user's input.  vm_map_unwire_count() now returns how much memory has
> > really been unwired.
> > 
> > Any objection?
> 
> 
> Can you elaborate on what you mean by "sys_munlock() trusts too much
> the user's input."   munlock(2) is supposed to return ENOMEM if any
> addresses within the specified range are not backed by valid mappings.
> (Valid mappings with PROT_NONE access are something of a gray area
> here.)  However, it is not error for a to call munlock() on a range
> that isn't locked, or to call it a second, third, etc. time on the
> same range.  Is that what is provoking your panic?

I'm using sysutils/pwsafe and it seems to mlock(2) 18 pages (end - start
= 73728 bytes) but then munlock(2) 73728 pages :-). vm_map_unwire()
seems to do the right thing with those buggy boundaries, but then the
racct code assumes that those boundaries were correct as long as
vm_map_unwire() returned successfully.  This is what causes the
assertion failure.

> By the way, sys_mlock() uses a simpler approach to deal with a similar
> situation:
> 
>         error = vm_map_wire(map, start, end, 
>             VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES);
> #ifdef RACCT
>         if (error != KERN_SUCCESS) {
>                 PROC_LOCK(proc);
>                 racct_set(proc, RACCT_MEMLOCK,
>                     ptoa(pmap_wired_count(map->pmap)));
>                 PROC_UNLOCK(proc);
>         }
> #endif
> 
> However, the code in sys_mlock() for maintaining RACCT_MEMLOCK,
> including the above snippet, is race-y.  Two simultaneous callers to
> sys_mlock() will produce incorrect results.  (I haven't looked at
> sys_mlockall() or vm_map_growstack().)

Ok, I will commit that as the first bandaid then.

> Also, a wired mapping can be destroyed by calling munmap(2) without
> first calling munlock(2), in which case, RACCT_MEMLOCK will be
> incorrect.

So I think the right way to tackle this is to handle racct in the vm
layer rather than at the syscall layer.

vm_fault_{wire,unwire}() look like natural places for this, but racct
functions require the struct proc to be locked.  Any idea how to handle
this?

-- 
Jeremie Le Hen

Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.