[In part a resend from the right Email account. In part adding a note about another Mark Johnston patch for reporting information.] On 2018-Aug-19, at 11:25 PM, Mark Millard <marklmi26-fbsd at yahoo.com> wrote: > blubee blubeeme gurenchan at gmail.com wrote on > Mon Aug 20 03:02:01 UTC 2018 : > >> I am running current compiling LLVM60 and when it comes to linking >> basically all the processes on my computer gets killed; Chrome, Firefox and >> some of the LLVM threads as well > >> . . . > >> last pid: 20965; load averages: 0.64, 5.79, 7.73 >> up 12+01:35:46 11:00:36 >> 76 processes: 1 running, 75 sleeping >> CPU: 0.8% user, 0.5% nice, 1.0% system, 0.0% interrupt, 98.1% idle >> Mem: 10G Active, 3G Inact, 100M Laundry, 13G Wired, 6G Free >> ARC: 4G Total, 942M MFU, 1G MRU, 1M Anon, 43M Header, 2G Other >> 630M Compressed, 2G Uncompressed, 2.74:1 Ratio >> Swap: 2G Total, 1G Used, 739M Free, 63% Inuse >> . . . > > The timing of that top output relative to the first or > any OOM kill of a process is not clear. After? Just > before? How long before? What it is like leading up > to the first kill is of interest. > > Folks that deal with this are likely to want do know > if you got console messages ( or var/log/messages content) > such as: > > pid 49735 (c++), uid 0, was killed: out of swap space > > (Note: "out of swap space" can be a misnomer for having > low Free RAM for "too long" [vm.pageout_oom_seq based], > even with swap unused or little used.) > > And: Were you also getting messages like: > > swap_pager_getswapspace(4): failed > > and/or: > > swap_pager: out of swap space > > (These indicate the "killed: out of swap space" is not > necessarily a misnomer relative to swap space, even if > low free RAM over a time drives the process kills.) > > How about messages like: > > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 28139, size: 65536 > > or any I/O error reports or retry reports? > > > > Notes: > > Mark Johnston published a patch used for some investigations of > the OOM killing: > > https://people.freebsd.org/~markj/patches/slow_swap.diff > > But this is tied to the I/O swap latencies involved and if they > are driving some time frames. It just adds more reporting to > the console ( and /var/log/messages ). It is not a fix. IT may > not be likely to report much for your context. > > > vm.pageout_oom_seq controls the "how long is low free RAM > tolerated" (my hprasing), though the units are not directly > time. In various arm contexts with small boards going from > the default of 12 to 120 allowed things to complete or get > much farther. So: > > sysctl vm.pageout_oom_seq=120 > > but 120 is not the limit: it is a C int parameter. > > I'll note that "low free RAM" is as FreeBSD classifies it, > whatever the details are. > > > > Most of the arm examples have been small memory contexts > and many of them likely avoid ZFS and use UFS instead. > ZFS and its ARC and such an additional complicated > context to the type of issue. There are lots of reports > around of the ARC growing too big. I do not know the > status of -r336196 relative to ZFS/ARC memory management > or if more recent versions have improvements. (I do not > use ZFS normally.) I've seen messages making suggestions > for controlling the growth but I'm no ZFS expert. > > > Just to give an idea what is sufficient to build > devel/llvm60: > > I will note that on a Pine64+ 2GB (so 2 GiBytes of RAM > in a aarch64 context with 4 cores, 1 HW-thread per core) > running -r337400, and using UFS on a USB drive and a > swap partition that drive too, I have built devel/llvm60 > 2 times via poudriere-devel: just one builder > allowed but it being allowed to use all 4 cores in > parallel, about 14.5 hr each time. (Different USB media > each time.) This did require the: > > sysctl vm.pageout_oom_seq=120 > > Mark Johnston's slow_swap.diff patch code did not > report any I/O latency problems in the swap subsystem. > > I've also built lang/gcc8 2 times, about 12.5 hrs > each time. > > No ZFS, no ARC, no Chrome, no FireFox. Nothing else > major going on beyond the devel/llvm60 build (or, later, > the lang/gcc8 build) in each case. Mark Johnston in the investigation for the arm context also had us use the following patch: diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c index 264c98203c51..9c7ebcf451ec 100644 --- a/sys/vm/vm_pageout.c +++ b/sys/vm/vm_pageout.c _at__at_ -1670,6 +1670,8 _at__at_ vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage, * start OOM. Initiate the selection and signaling of the * victim. */ + printf("v_free_count: %u, v_inactive_count: %u\n", + vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt); vm_pageout_oom(VM_OOM_MEM); /* This patch is not about the I/O latencies but about the free RAM and inactive RAM at exactly the point of the OOM kill activity. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)Received on Mon Aug 20 2018 - 04:37:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:17 UTC