Konstantin Belousov wrote this message on Sat, Nov 24, 2018 at 12:40 +0200: > On Sat, Nov 24, 2018 at 01:04:29AM -0800, John-Mark Gurney wrote: > > I have an BeagleBoard Black. I'm running a recent snapshot: > > FreeBSD generic 13.0-CURRENT FreeBSD 13.0-CURRENT r340239 GENERIC arm > > > > aka: > > FreeBSD-13.0-CURRENT-arm-armv7-BEAGLEBONE-20181107-r340239.img.xz > > > > It has 512MB of memory on board. I created a 4GB swap file. According > > to loader(8), this should be the default capable: > > in bytes of KVA space. If no value is provided, the system > > allocates enough memory to handle an amount of swap that > > corresponds to eight times the amount of physical memory > > present in the system. > > > > avail memory = 505909248 (482 MB) > > > > but I get this: > > warning: total configured swap (1048576 pages) exceeds maximum recommended amount (248160 pages). > > warning: increase kern.maxswzone or reduce amount of swap. > > > > So, this appears that it's only 2x amount of memory, NOT 8x like the > > documentation says. > > > > When running make in sbin/ggate/ggated, make consumes a large amount > > of memory. Before the OOM killer just kicked in, top showed: > > Mem: 224M Active, 4096 Inact, 141M Laundry, 121M Wired, 57M Buf, 2688K Free > > Swap: 1939M Total, 249M Used, 1689M Free, 12% Inuse, 1196K Out > > > > PID UID THR PRI NICE SIZE RES STATE TIME WCPU COMMAND > > 1029 1001 1 44 0 594M 3848K RUN 2:03 38.12% make > > > > swapinfo -k showed: > > /dev/md99 4194304 254392 3939912 6% > > > > sysctl: > > vm.swzone: 4466880 > > vm.swap_maxpages: 496320 > > kern.maxswzone: 0 > > > > dmesg when OOM strikes: > > swap blk zone exhausted, increase kern.maxswzone > > pid 1029 (make), uid 1001, was killed: out of swap space > > pid 984 (bash), uid 1001, was killed: out of swap space > > pid 956 (bash), uid 1001, was killed: out of swap space > > pid 952 (sshd), uid 0, was killed: out of swap space > > pid 1043 (bash), uid 1001, was killed: out of swap space > > pid 626 (dhclient), uid 65, was killed: out of swap space > > pid 955 (sshd), uid 1001, was killed: out of swap space > > pid 1025 (bash), uid 1001, was killed: out of swap space > > swblk zone ok > > lock order reversal: > > 1st 0xd374d028 filedesc structure (filedesc structure) _at_ /usr/src/sys/kern/sys_generic.c:1451 > > 2nd 0xd41a5bc4 devfs (devfs) _at_ /usr/src/sys/kern/vfs_vnops.c:1513 > > stack backtrace: > > swap blk zone exhausted, increase kern.maxswzone > > pid 981 (tmux), uid 1001, was killed: out of swap space > > pid 983 (tmux), uid 1001, was killed: out of swap space > > pid 1031 (bash), uid 1001, was killed: out of swap space > > pid 580 (dhclient), uid 0, was killed: out of swap space > > swblk zone ok > > swap blk zone exhausted, increase kern.maxswzone > > pid 577 (dhclient), uid 0, was killed: out of swap space > > pid 627 (devd), uid 0, was killed: out of swap space > > swblk zone ok > > swap blk zone exhausted, increase kern.maxswzone > > pid 942 (getty), uid 0, was killed: out of swap space > > swblk zone ok > > swap blk zone exhausted, increase kern.maxswzone > > pid 1205 (init), uid 0, was killed: out of swap space > > swblk zone ok > > swap blk zone exhausted, increase kern.maxswzone > > pid 1206 (init), uid 0, was killed: out of swap space > > swblk zone ok > > swap blk zone exhausted, increase kern.maxswzone > > swblk zone ok > > swap blk zone exhausted, increase kern.maxswzone > > swblk zone ok > > > > So, as you can see, despite having plenty of swap, and swap usage being > > well below any of the maximums, the OOM killer kicked in, and killed off > > a bunch of processes. > OOM is guided by the pagedaemon progress, not by the swap amount left. > If the system cannot meet the pagedaemon targetp by doing > $(sysctl vm.pageout_oom_seq) back-to-back page daemon passes, > it declares OOM condition. E.g. if you have very active process which > keeps a lot of active memory by referencing the pages, and simultenously > a slow or stuck swap device, then you get into this state. > > Just by looking at the top stats, you have a single page in the inactive > queue, which means that pagedaemon desperately frees clean pages and > moves dirty pages into the laundry. Also, you have relatively large > laundry queue, which supports the theory about slow swap. Yes, swap is "slow" by modern standards, but not really that slow... I'm swapping out at over 10MB/sec... For such a system, this is quite fast... Though maybe I wasn't explicit, it's very clear that I'm running out of the swap blk zone, per the very first message, and the vmstat -z stats below (and the resulting failures): swap blk zone exhausted > You may try to increase vm.pageout_oom_seq to move OOM trigger furhter > after the system is overloaded with swapping. > > > > > It also looks like the algorithm for calculating kern.maxswzone is not > > correct. > > > > I just tried to run the system w/: > > kern.maxswzone: 21474836 > > > > and it again died w/ plenty of swap free: > > /dev/md99 4194304 238148 3956156 6% > > > > This time I had vmstat -z | grep sw running, and saw: > > swpctrie: 48, 62084, 145, 270, 203, 0, 0 > > swblk: 72, 62040, 56357, 18, 56587, 0, 0 > > > > after the system died, I logged back in as see: > > swpctrie: 48, 62084, 28, 387, 240, 0, 0 > > swblk: 72, 62040, 175, 61865, 62957, 16, 0 > > > > so, it clearly ran out of swblk space VERY early, when only consuming > > around 232MB of swap... > > > > Hmm... it looks like swblk and swpctrie are not affected by the setting > > of kern.maxswzone... I just set it to: > > kern.maxswzone: 85899344 > > > > and the limits for the zones did not increase at ALL: > > swpctrie: 48, 62084, 0, 0, 0, 0, 0 > > swblk: 72, 62040, 0, 0, 0, 0, 0 > The swap metadata zones must have all the KVA reserved in advance, > because we cannot wait for AS or memory while we try to free some > memory. At boot, the swap init code allocates KVA starting with the > requested amount. If the allocation fails, it reduces the amount by > 2/3 and retries, until the allocation succeeds. What you see in limits > is the actual amount of KVA that your platform is able to provide for > reserve, so increasing the maxswzone only results in more iterations to > allocate. Except that I don't see the warning "Swap blk zone entries reduced from" in the dmesg which I'd expect to see that code is triggered... I find it hard to believe that it can't allocate more than 5MB of KVA at boot... per above, 72*62040 ~= 4.26MB... It does look like the calculation is correct for swblk assuming maxswzone is not set (0), as: vm.stats.vm.v_page_count: 124041 and: n = vm_cnt.v_page_count / 2; I'll be adding a print for maxswzone to make sure it's getting set, though it'll take me a while to get a kernel built... and kenv does show it set: [freebsd_at_generic ~]$ sysctl kern.maxswzone kern.maxswzone: 85899344 [freebsd_at_generic ~]$ kenv | grep kern.maxswzone kern.maxswzone="85899344" so how that code isn't being triggered is quite strange... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."Received on Sat Nov 24 2018 - 19:09:39 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC