Re: After update to r357104 build of poudriere jail fails with 'out of swap space'

From: Rodney W. Grimes <freebsd-rwg_at_gndrsh.dnsmgr.net> Date: Sat, 25 Jan 2020 12:50:59 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC

> Yasuhiro KIMURA yasu at utahime.org wrote on
> Sat Jan 25 14:45:13 UTC 2020 :
> 
> > I use VirtualBox to run 13-CURRENT. Host is 64bit Windows 10 1909 and
> > spec of VM is as following.
> > 
> > * 4 CPU
> > * 8GB memory
> > * 100GB disk
> >   - 92GB ZFS pool (zroot)
> >   - 8GB swap
> > 
> > Today I updated this VM to r357104. And after that I tried to update
> > poudriere jail with `poudriere jail -u -j jailname -b`. But it failed
> > at install stage. After the failure I found following message is
> > written to syslog.
> > 
> > Jan 25 19:18:25 rolling-vm-freebsd1 kernel: pid 7963 (strip), jid 0, uid 0, was killed: out of swap space
> 
> This message text's detailed wording is a misnomer.
> Do you also have any messages of the form:
> 
> . . . sentinel kernel: swap_pager_getswapspace(32): failed
> 
> If yes: you really were out of swap space.
> If no:  you were not out of swap space,
>         or at least it is highly unlikely that you were.
> 
> FreeBSD kills processes for multiple potential reasons.
> For example:
> 
> a) Still low on free RAM after a number of tries to increase it above a threshold.
> b) Slow paging I/O.
> c) . . . (I do not know the full list) . . .
> 
> Unfortunately, FreeBSD is not explicit about the category
> of problem that leads to the kill activity that happens.
> 
> You might learn more by watching how things are going
> via top or some such program or other way of monitoring.
> 
> 
> Below are some notes about specific tunables that might
> or might not be of help. (There may be more tunables
> that can help that I do not know about.)
> 
> For (a) there is a way to test if it is the issue by
> adding to the number of tries before it gives up and
> starts killing things. That will either:
> 
> 1) let it get more done before kills start
> 2) let it complete before the count is reached
> 3) make no significant difference
> 
> (3) would imply that (b) or (c) are involved instead.
> 
> (1) might be handled by having it do even more tries.
> 
> For delaying how long free RAM staying low is
> tolerated, one can increase vm.pageout_oom_seq from
> 12 to larger. The management of slow paging I've
> less experience with but do have some notes about
> below.
> 
> Examples follow that I use in contexts with
> sufficient RAM that I do not have to worry about
> out of swap/page space. These I've set in
> /etc/sysctl.conf . (Of course, I'm not trying to
> deliberately run out of RAM.)
> 
> #
> # Delay when persisstent low free RAM leads to
> # Out Of Memory killing of processes:
> vm.pageout_oom_seq=120
> 
> I'll note that figures like 1024 or 1200 or
> even more are possible. This is controlling how
> many tries at regaining sufficient free RAM
> that that level would be tolerated long-term.
> After that it starts Out Of Memory kills to get
> some free RAM.
> 
> No figure is designed to make the delay
> unbounded. There may be large enough figures to
> effectively be bounded beyond any reasonable
> time to wait.
> 
> 
> As for paging I/O (this is specific to 13,
> or was last I checked):
> 
> #
> # For plunty of swap/paging space (will not
> # run out), avoid pageout delays leading to
> # Out Of Memory killing of processes:
> vm.pfault_oom_attempts=-1
> 
> (Note: In my context "plunty" really means
> sufficient RAM that paging is rare. But
> others have reported on using the -1 in
> contexts where paging was heavy at times and
> OOM kills had been happening that were
> eliminated by the assignment.)
> 
> I've no experience with the below alternative
> to that -1 use:
> 
> #
> # For possibly insufficient swap/paging space
> # (might run out), increase the pageout delay
> # that leads to Out Of Memory killing of
> # processes:
> #vm.pfault_oom_attempts= ???
> #vm.pfault_oom_wait= ???
> # (The multiplication is the total but there
> # are other potential tradoffs in the factors
> # multiplied, even for nearly the same total.)
> 
> 
> I'm not claiming that these 3 vm.???_oom_???
> figures are always sufficient. Nor am I
> claiming that tunables are always available
> that would be sufficient. Nor that it is easy
> to find the ones that do exist that might
> help for specific OOM kill issues.
> 
> I have seen reports of OOM kills for other
> reasons when both vm.pageout_oom_seq and
> vm.pfault_oom_attempts=-1 were in use.
> As I understand, FreeBSD did not report
> what kind of condition lead to the
> decision to do an OOM kill.
> 
> So the above notes may or may-not help you.

All the advice by Mark above is very sound and solid, however my
first step would be to cut back the memory pig that is ZFS with:
vfs.zfs.arc_max=4294967296
added to loader.conf

> 
> > To make sure I shutdown both VM and host, restarted them and tried
> > update of jail again. Then the problem was reproduced.
> 
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
> 
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> 

-- 
Rod Grimes                                                 rgrimes_at_freebsd.org