Re: maxswzone NOT used correctly and defaults incorrect?

From: Tĳl Coosemans <tijl_at_FreeBSD.org> Date: Sat, 24 Nov 2018 16:54:37 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC

On Sat, 24 Nov 2018 01:04:29 -0800 John-Mark Gurney <jmg_at_funkthat.com> wrote:
> I have an BeagleBoard Black.  I'm running a recent snapshot:
> FreeBSD generic 13.0-CURRENT FreeBSD 13.0-CURRENT r340239 GENERIC  arm
> 
> aka:
> FreeBSD-13.0-CURRENT-arm-armv7-BEAGLEBONE-20181107-r340239.img.xz
> 
> It has 512MB of memory on board.  I created a 4GB swap file.  According
> to loader(8), this should be the default capable:
>                    in bytes of KVA space.  If no value is provided, the system
>                    allocates enough memory to handle an amount of swap that
>                    corresponds to eight times the amount of physical memory
>                    present in the system.
> 
> avail memory = 505909248 (482 MB)
> 
> but I get this:
> warning: total configured swap (1048576 pages) exceeds maximum recommended amount (248160 pages).
> warning: increase kern.maxswzone or reduce amount of swap.
> 
> So, this appears that it's only 2x amount of memory, NOT 8x like the
> documentation says.
> 
> When running make in sbin/ggate/ggated, make consumes a large amount
> of memory.  Before the OOM killer just kicked in, top showed:
> Mem: 224M Active, 4096 Inact, 141M Laundry, 121M Wired, 57M Buf, 2688K Free
> Swap: 1939M Total, 249M Used, 1689M Free, 12% Inuse, 1196K Out
> 
>   PID    UID      THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
>  1029   1001        1  44    0   594M  3848K RUN      2:03  38.12% make
> 
> swapinfo -k showed:
> /dev/md99         4194304   254392  3939912     6%
> 
> sysctl:
> vm.swzone: 4466880
> vm.swap_maxpages: 496320
> kern.maxswzone: 0
> 
> dmesg when OOM strikes:
> swap blk zone exhausted, increase kern.maxswzone
> pid 1029 (make), uid 1001, was killed: out of swap space
> pid 984 (bash), uid 1001, was killed: out of swap space
> pid 956 (bash), uid 1001, was killed: out of swap space
> pid 952 (sshd), uid 0, was killed: out of swap space
> pid 1043 (bash), uid 1001, was killed: out of swap space
> pid 626 (dhclient), uid 65, was killed: out of swap space
> pid 955 (sshd), uid 1001, was killed: out of swap space
> pid 1025 (bash), uid 1001, was killed: out of swap space
> swblk zone ok
> lock order reversal:
>  1st 0xd374d028 filedesc structure (filedesc structure) _at_ /usr/src/sys/kern/sys_generic.c:1451
>  2nd 0xd41a5bc4 devfs (devfs) _at_ /usr/src/sys/kern/vfs_vnops.c:1513
> stack backtrace:
> swap blk zone exhausted, increase kern.maxswzone
> pid 981 (tmux), uid 1001, was killed: out of swap space
> pid 983 (tmux), uid 1001, was killed: out of swap space
> pid 1031 (bash), uid 1001, was killed: out of swap space
> pid 580 (dhclient), uid 0, was killed: out of swap space
> swblk zone ok
> swap blk zone exhausted, increase kern.maxswzone
> pid 577 (dhclient), uid 0, was killed: out of swap space
> pid 627 (devd), uid 0, was killed: out of swap space
> swblk zone ok
> swap blk zone exhausted, increase kern.maxswzone
> pid 942 (getty), uid 0, was killed: out of swap space
> swblk zone ok
> swap blk zone exhausted, increase kern.maxswzone
> pid 1205 (init), uid 0, was killed: out of swap space
> swblk zone ok
> swap blk zone exhausted, increase kern.maxswzone
> pid 1206 (init), uid 0, was killed: out of swap space
> swblk zone ok
> swap blk zone exhausted, increase kern.maxswzone
> swblk zone ok
> swap blk zone exhausted, increase kern.maxswzone
> swblk zone ok
> 
> So, as you can see, despite having plenty of swap, and swap usage being
> well below any of the maximums, the OOM killer kicked in, and killed off
> a bunch of processes.
> 
> It also looks like the algorithm for calculating kern.maxswzone is not
> correct.
> 
> I just tried to run the system w/:
> kern.maxswzone: 21474836
> 
> and it again died w/ plenty of swap free:
> /dev/md99         4194304   238148  3956156     6%
> 
> This time I had vmstat -z | grep sw running, and saw:
> swpctrie:                48,  62084,     145,     270,     203,   0,   0
> swblk:                   72,  62040,   56357,      18,   56587,   0,   0
> 
> after the system died, I logged back in as see:
> swpctrie:                48,  62084,      28,     387,     240,   0,   0
> swblk:                   72,  62040,     175,   61865,   62957,  16,   0
> 
> so, it clearly ran out of swblk space VERY early, when only consuming
> around 232MB of swap...
> 
> Hmm... it looks like swblk and swpctrie are not affected by the setting
> of kern.maxswzone...  I just set it to:
> kern.maxswzone: 85899344
> 
> and the limits for the zones did not increase at ALL:
> swpctrie:                48,  62084,       0,       0,       0,   0,   0
> swblk:                   72,  62040,       0,       0,       0,   0,   0

Can you try this patch?  I've been running with it for a few months now
and no longer observe weird OOM kills.  IIUC in shortfall mode when nfreed
equals zero, this change makes it stay in shortfall mode instead of going
to background mode.  I don't know enough about VM internals to know if
this is the correct fix though.

Index: sys/vm/vm_pageout.c
===================================================================
--- sys/vm/vm_pageout.c	(revision 340673)
+++ sys/vm/vm_pageout.c	(working copy)
_at__at_ -1040,7 +1040,7 _at__at_ trybackground:
 		nclean = vmd->vmd_free_count +
 		    vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt;
 		ndirty = vmd->vmd_pagequeues[PQ_LAUNDRY].pq_cnt;
-		if (target == 0 && ndirty * isqrt(howmany(nfreed + 1,
+		if (target == 0 && ndirty * isqrt(howmany(nfreed,
 		    vmd->vmd_free_target - vmd->vmd_free_min)) >= nclean) {
 			target = vmd->vmd_background_launder_target;
 		}