Re: fork_findpid() - Fatal trap 12: page fault while in kernel mode

From: Mateusz Guzik <mjguzik_at_gmail.com> Date: Fri, 18 Dec 2015 19:45:52 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:01 UTC

On Fri, Dec 18, 2015 at 09:44:10AM -0800, Don Lewis wrote:
> On 18 Dec, Mateusz Guzik wrote:
> > On Thu, Dec 17, 2015 at 02:33:46PM -0800, Don Lewis wrote:
> >> On 17 Dec, Mateusz Guzik wrote:
> >> > On Thu, Dec 17, 2015 at 11:48:08AM -0800, Don Lewis wrote:
> >> >> On 17 Dec, Konstantin Belousov wrote:
> >> >> > On Wed, Dec 16, 2015 at 11:08:02AM -0800, Don Lewis wrote:
> >> >> >> I used to have a patch the deferred linking the new process into
> >> >> >> proctree/allproc until it was fully formed.  The motivation was to get
> >> >> >> rid of all of the PRS_NEW stuff scattered around the source.
> >> >> >> Unfortunately the patch bit-rotted and I'm pretty sure that I lost it.
> >> >> > 
> >> >> > I had similar tought for a second as one of the possibilities to fix the
> >> >> > issue, but rejected it outright due to the way the pid allocator works.
> >> >> > The loop which faulted is the allocator, it depends on the new pid being
> >> >> > linked early to detect the duplicated alloc.
> >> >> > 
> >> >> > What you wrote could be done, but this restructuring requires the separate
> >> >> > pid allocator, and probably it must repeat all quirks and subtle behaviour
> >> >> > of the current algorithm.  But I do not object, PRS_NEW is a trouble
> >> >> > on its own.
> >> >> 
> >> >> I don't think it requires any changes to the allocater.  It should only
> >> >> be necessary to delay the call to fork_findpid() until we are ready to
> >> >> link the new proc into allproc.  Basically, drop the locks at the
> >> >> beginning of do_fork(), then grab them again somewhere near the end
> >> >> (probably where we are currently mark the process as PRS_NORMAL) and
> >> >> move the call to fork_findpid(), the p2->p_pid assignment, and the list
> >> >> manipulation code to a location after that.
> >> >> 
> >> >> It's probably not quite that simple though ...
> >> > 
> >> > That would mean you would need to be able to deconstruct the process
> >> > because you cannot guarantee there are any pids left, which may or may
> >> > not be easily doable.
> >> 
> >> It doesn't look like we handle that properly in the current code.  I
> >> think fork_findpid() will loop forever.  It shouldn't be possible if
> >> maxproc < pid_max / 3, or maybe pid_max / 2.  It might be a good idea to
> >> enforce this.
> >> 
> > 
> > Not sure I follow, can you rephrase/elaborate?
> 
> The first time through, fork_findpid() will start it's search with
> trypid=lastpid+1 and searches upwards from there.  If it reaches PID_MAX
> (I think that should be pid_max) without finding a free pid, it does a
> goto retry, which resets trypid back to the beginning and restarts the
> search.  IF there are no free pids, then trypid will goto retry and
> repeat this same loop forever.
> 
> There is no error return from fork_findpid() to indicate that the fork
> should fail if there are no free pids.
> 

Oh, you mean nprocs check used prior to calling do_fork is insufficient.

I can't test it right now, indeed sees like a real problem. Bitmap
switch fixes this automatically without further hackery.

> >> > The current method is going to bite us performance-wise anyway and an
> >> > allocater which does not require a walk over the tree is necessary in
> >> > the long run. Seems like a bitmap (or a bunch of bitmaps) is the way to
> >> > go here.
> >> 
> >> I think that separate bitmaps for process, process group, and session
> >> ids would be needed.  It would waste some space, but it's probably more
> >> efficent to use a byte array and store all the bits for the pid
> >> together.
> >> 
> > 
> > Well I had such separate bitmaps in mind with addition of a combined
> > "the id is in use bitmap". This would make the common case of finding a
> > new pid reasonably fast. Access to all bitmaps would be protected with
> > proctree lock, which matches current locking scheme anyway.
> 
> That's also a possibility.  Maintaining the bitmaps would be more
> complicated because any time one of the individual bitmaps is updated,
> the combined bitmap would also have to be recalculated.  It would be
> possible to use bit_ffc() to find the first free pid, but that would
> always find the lowest available free pid and would not emulate the
> current default behaviour of allocating pids mostly sequentually.
> 

What do you mean recalculated? Maybe I got this wrong, but it seems
totally trivial.

When allocating either a new process group or a new session, set the bit
in the combined map.

When allocating a new pid, check the combined map. Free entry implies an
unused pid, so use that. Set the bit in the combined map and pid map.

Finally, when freeing either of these identifiers, unset the bit in the
combined map only if no other map has the bit set.

-- 
Mateusz Guzik <mjguzik gmail.com>