Memory corruption in a master perl process after child exits - only under FreeBSD 10.0 amd64 (not in 10.1 or 9.*)

From: Mark Martinec <Mark.Martinec+freebsd_at_ijs.si>
Date: Mon, 26 Jan 2015 20:32:22 +0100
There is a problem report since July 2014 in a Perl bug tracker,
which seems to affect only FreeBSD 10.0 amd64 (regardless of a
version of Perl or usage of clang vs. gcc compiler):

   https://rt.perl.org/Ticket/Display.html?id=122199

I wonder if someone intimately familiar with handling of virtual
memory, fork, swap, and process exit / wait(2) under FreeBSD
would be able to recognize what has changed in these areas between
9.2 -> 10.0 and 10.0 -> 10.1, so that only 10.0 is misbehaving,
but 10.1 apparently fixed the problem again.

Below is my short summary of the issue (it is the last comment
in the referenced problem report). Further details are in that PR.

It's been a real mystery, difficult to reproduce, but definitely there.
It might be a Perl bug, but it looks ever more likely that it is
a FreeBSD issue.

   Mark



After upgrading to FreeBSD 10.1 (from 10.0) and running the same 
application
with the same version of Perl for two months now, with child process 
periodic
retiring and re-spawning new child process by a master process as 
previously
under FreeBSD 9.x, I can now confirm that the problem no longer occurs.

I can also confirm that the problem under 10.0 can be avoided by
not letting child processes to voluntarily exit, so the master process
never sees a child termination in wait() and never needs to spawn (fork)
another child process.

A brief summary of the problem:

Setup: an application consisting of a master perl process spawning 
worker
child processes, which periodically voluntarily self-terminate, to be
replaced by a fresh child process forked from the master process.

Environent:
- occurs only on FreeBSD 10.0 amd64, any recent version of perl, gcc or 
clang.
- does not occur on FreeBSD 9.x or 10.1, and not on i383, not 
reproducible
   on Linux

What seems to be happening:
- a child process after doing some work (possibly touching swap)
   does a normal exit;
- a parent process gets a SIGCHLD signal, handles a wait() and
   for some obscure reason some of its memory gets corrupted;
- a parent process forks creating a new worker child process,
   which inherits corrupted sections of parent's memory,
   consequently later leading to its (child) crash if it happens
   to use that part of the memory (opcodes or data structures)
   during its normal work. Any newly born child process inherits
   the same memory corruption and crashes alike.

So it seems the problem is somehow connected with how FreeBSD 10.0
on amd64 manages virtual memory (fork, exit, wait, possibly
involving swap). The problem is apparently fixed in 10.1, and
not present in 9.x. Does anybody have a sound explanation?
Received on Mon Jan 26 2015 - 18:32:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:55 UTC