Re: building LLVM threads gets killed

From: Mark Millard <marklmi26-fbsd_at_yahoo.com> Date: Sun, 19 Aug 2018 23:25:48 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:17 UTC

blubee blubeeme gurenchan at gmail.com wrote on
Mon Aug 20 03:02:01 UTC 2018 :

> I am running current compiling LLVM60 and when it comes to linking
> basically all the processes on my computer gets killed; Chrome, Firefox and
> some of the LLVM threads as well

> . . .

> last pid: 20965;  load averages:  0.64,  5.79,  7.73
>                                              up 12+01:35:46  11:00:36
> 76 processes:  1 running, 75 sleeping
> CPU:  0.8% user,  0.5% nice,  1.0% system,  0.0% interrupt, 98.1% idle
> Mem: 10G Active, 3G Inact, 100M Laundry, 13G Wired, 6G Free
> ARC: 4G Total, 942M MFU, 1G MRU, 1M Anon, 43M Header, 2G Other
>      630M Compressed, 2G Uncompressed, 2.74:1 Ratio
> Swap: 2G Total, 1G Used, 739M Free, 63% Inuse
> . . .

The timing of that top output relative to the first or
any OOM kill of a process is not clear. After? Just
before? How long before? What it is like leading up
to the first kill is of interest.

Folks that deal with this are likely to want do know
if you got console messages ( or var/log/messages content)
such as:

pid 49735 (c++), uid 0, was killed: out of swap space

(Note: "out of swap space" can be a misnomer for having
low Free RAM for "too long" [vm.pageout_oom_seq based],
even with swap unused or little used.)

And: Were you also getting messages like:

swap_pager_getswapspace(4): failed

and/or:

swap_pager: out of swap space

(These indicate the "killed: out of swap space" is not
necessarily a misnomer relative to swap space, even if
low free RAM over a time drives the process kills.)

How about messages like:

swap_pager: indefinite wait buffer: bufobj: 0, blkno: 28139, size: 65536

or any I/O error reports or retry reports?

Notes:

Mark Johnston published a patch used for some investigations of
the OOM killing:

https://people.freebsd.org/~markj/patches/slow_swap.diff

But this is tied to the I/O swap latencies involved and if they
are driving some time frames. It just adds more reporting to
the console ( and /var/log/messages ). It is not a fix. IT may
not be likely to report much for your context.

vm.pageout_oom_seq controls the "how long is low free RAM
tolerated" (my hprasing), though the units are not directly
time. In various arm contexts with small boards going from
the default of 12 to 120 allowed things to complete or get
much farther. So:

sysctl vm.pageout_oom_seq=120

but 120 is not the limit: it is a C int parameter.

I'll note that "low free RAM" is as FreeBSD classifies it,
whatever the details are.

Most of the arm examples have been small memory contexts
and many of them likely avoid ZFS and use UFS instead.
ZFS and its ARC and such an additional complicated
context to the type of issue. There are lots of reports
around of the ARC growing too big. I do not know the
status of -r336196 relative to ZFS/ARC memory management
or if more recent versions have improvements. (I do not
use ZFS normally.) I've seen messages making suggestions
for controlling the growth but I'm no ZFS expert.

Just to give an idea what is sufficient to build
devel/llvm60:

I will note that on a Pine64+ 2GB (so 2 GiBytes of RAM
in a aarch64 context with 4 cores, 1 HW-thread per core)
running -r337400, and using UFS on a USB drive and a
swap partition that drive too, I have built devel/llvm60
2 times via poudriere-devel: just one builder
allowed but it being allowed to use all 4 cores in
parallel, about 14.5 hr each time. (Different USB media
each time.) This did require the:

sysctl vm.pageout_oom_seq=120

Mark Johnston's slow_swap.diff patch code did not
report any I/O latency problems in the swap subsystem.

I've also built lang/gcc8 2 times, about 12.5 hrs
each time.

No ZFS, no ARC, no Chrome, no FireFox. Nothing else
major going on beyond the devel/llvm60 build (or, later,
the lang/gcc8 build) in each case.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)