Re: building LLVM threads gets killed

From: Mark Millard <marklmi_at_yahoo.com> Date: Sun, 19 Aug 2018 23:37:27 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:17 UTC

[In part a resend from the right Email account.
In part adding a note about another Mark Johnston
patch for reporting information.]

On 2018-Aug-19, at 11:25 PM, Mark Millard <marklmi26-fbsd at yahoo.com> wrote:

> blubee blubeeme gurenchan at gmail.com wrote on
> Mon Aug 20 03:02:01 UTC 2018 :
> 
>> I am running current compiling LLVM60 and when it comes to linking
>> basically all the processes on my computer gets killed; Chrome, Firefox and
>> some of the LLVM threads as well
> 
>> . . .
> 
>> last pid: 20965;  load averages:  0.64,  5.79,  7.73
>>                                             up 12+01:35:46  11:00:36
>> 76 processes:  1 running, 75 sleeping
>> CPU:  0.8% user,  0.5% nice,  1.0% system,  0.0% interrupt, 98.1% idle
>> Mem: 10G Active, 3G Inact, 100M Laundry, 13G Wired, 6G Free
>> ARC: 4G Total, 942M MFU, 1G MRU, 1M Anon, 43M Header, 2G Other
>>     630M Compressed, 2G Uncompressed, 2.74:1 Ratio
>> Swap: 2G Total, 1G Used, 739M Free, 63% Inuse
>> . . .
> 
> The timing of that top output relative to the first or
> any OOM kill of a process is not clear. After? Just
> before? How long before? What it is like leading up
> to the first kill is of interest.
> 
> Folks that deal with this are likely to want do know
> if you got console messages ( or var/log/messages content)
> such as:
> 
> pid 49735 (c++), uid 0, was killed: out of swap space
> 
> (Note: "out of swap space" can be a misnomer for having
> low Free RAM for "too long" [vm.pageout_oom_seq based],
> even with swap unused or little used.)
> 
> And: Were you also getting messages like:
> 
> swap_pager_getswapspace(4): failed
> 
> and/or:
> 
> swap_pager: out of swap space
> 
> (These indicate the "killed: out of swap space" is not
> necessarily a misnomer relative to swap space, even if
> low free RAM over a time drives the process kills.)
> 
> How about messages like:
> 
> swap_pager: indefinite wait buffer: bufobj: 0, blkno: 28139, size: 65536
> 
> or any I/O error reports or retry reports?
> 
> 
> 
> Notes:
> 
> Mark Johnston published a patch used for some investigations of
> the OOM killing:
> 
> https://people.freebsd.org/~markj/patches/slow_swap.diff
> 
> But this is tied to the I/O swap latencies involved and if they
> are driving some time frames. It just adds more reporting to
> the console ( and /var/log/messages ). It is not a fix. IT may
> not be likely to report much for your context.
> 
> 
> vm.pageout_oom_seq controls the "how long is low free RAM
> tolerated" (my hprasing), though the units are not directly
> time. In various arm contexts with small boards going from
> the default of 12 to 120 allowed things to complete or get
> much farther. So:
> 
> sysctl vm.pageout_oom_seq=120
> 
> but 120 is not the limit: it is a C int parameter.
> 
> I'll note that "low free RAM" is as FreeBSD classifies it,
> whatever the details are.
> 
> 
> 
> Most of the arm examples have been small memory contexts
> and many of them likely avoid ZFS and use UFS instead.
> ZFS and its ARC and such an additional complicated
> context to the type of issue. There are lots of reports
> around of the ARC growing too big. I do not know the
> status of -r336196 relative to ZFS/ARC memory management
> or if more recent versions have improvements. (I do not
> use ZFS normally.) I've seen messages making suggestions
> for controlling the growth but I'm no ZFS expert.
> 
> 
> Just to give an idea what is sufficient to build
> devel/llvm60:
> 
> I will note that on a Pine64+ 2GB (so 2 GiBytes of RAM
> in a aarch64 context with 4 cores, 1 HW-thread per core)
> running -r337400, and using UFS on a USB drive and a
> swap partition that drive too, I have built devel/llvm60
> 2 times via poudriere-devel: just one builder
> allowed but it being allowed to use all 4 cores in
> parallel, about 14.5 hr each time. (Different USB media
> each time.) This did require the:
> 
> sysctl vm.pageout_oom_seq=120
> 
> Mark Johnston's slow_swap.diff patch code did not
> report any I/O latency problems in the swap subsystem.
> 
> I've also built lang/gcc8 2 times, about 12.5 hrs
> each time.
> 
> No ZFS, no ARC, no Chrome, no FireFox. Nothing else
> major going on beyond the devel/llvm60 build (or, later,
> the lang/gcc8 build) in each case.

Mark Johnston in the investigation for the arm context
also had us use the following patch:

diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
index 264c98203c51..9c7ebcf451ec 100644
--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
_at__at_ -1670,6 +1670,8 _at__at_ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
         * start OOM.  Initiate the selection and signaling of the
         * victim.
         */
+       printf("v_free_count: %u, v_inactive_count: %u\n",
+           vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
        vm_pageout_oom(VM_OOM_MEM);

        /*

This patch is not about the I/O latencies but about the
free RAM and inactive RAM at exactly the point of the
OOM kill activity.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)