On 2020-Jan-27, at 19:53, bob prohaska <fbsd at www.zefox.net> wrote: > On Mon, Jan 27, 2020 at 06:22:20PM -0800, Mark Millard wrote: >> >> So far as I know, in the past progress was only made when someone >> already knowledgable got involved in isolating what was happening >> and how to control it. >> > Indeed. One can only hope said knowledgeables are reading.... May be I can suggest something that might kick-start evidence gathering a little bit: add 4 unconditional printf's to the kernel code, each just before one of the vm_pageout_oom(. . .) calls. Have the message uniquely identify which of the 4 it is before. The details of what I found that suggested this follows. I found: #define VM_OOM_MEM 1 #define VM_OOM_MEM_PF 2 #define VM_OOM_SWAPZ 3 In vm_fault(. . .) : . . . if (vm_pfault_oom_attempts < 0 || oom < vm_pfault_oom_attempts) { oom++; vm_waitpfault(dset, vm_pfault_oom_wait * hz); goto RetryFault_oom; } if (bootverbose) printf( "proc %d (%s) failed to alloc page on fault, starting OOM\n", curproc->p_pid, curproc->p_comm); vm_pageout_oom(VM_OOM_MEM_PF); . . . (I'd not have guessed that bootverbose would control messages about OOM activity.) The above one looks to be blocked by the "-1" setting that we have been using. In vm_pageout_mightbe_oom(. . .) : . . . if (starting_page_shortage <= 0 || starting_page_shortage != page_shortage) vmd->vmd_oom_seq = 0; else vmd->vmd_oom_seq++; if (vmd->vmd_oom_seq < vm_pageout_oom_seq) { if (vmd->vmd_oom) { vmd->vmd_oom = FALSE; atomic_subtract_int(&vm_pageout_oom_vote, 1); } return; } /* * Do not follow the call sequence until OOM condition is * cleared. */ vmd->vmd_oom_seq = 0; if (vmd->vmd_oom) return; vmd->vmd_oom = TRUE; old_vote = atomic_fetchadd_int(&vm_pageout_oom_vote, 1); if (old_vote != vm_ndomains - 1) return; /* * The current pagedaemon thread is the last in the quorum to * start OOM. Initiate the selection and signaling of the * victim. */ vm_pageout_oom(VM_OOM_MEM); /* * After one round of OOM terror, recall our vote. On the * next pass, current pagedaemon would vote again if the low * memory condition is still there, due to vmd_oom being * false. */ vmd->vmd_oom = FALSE; atomic_subtract_int(&vm_pageout_oom_vote, 1); . . . The above is where the other setting we have been using extends the number of tries before doing the OOM kill. If the rate of attempts increased, less time would go by for the same figure? This case might still be happening, even for the > 4000 figure used on the 5 GiByte amd64 system with the i386 jail that was reported? No specific printf above as things are. In swp_pager_meta_build(. . .) : . . . if (uma_zone_exhausted(swblk_zone)) { if (atomic_cmpset_int(&swblk_zone_exhausted, 0, 1)) printf("swap blk zone exhausted, " "increase kern.maxswzone\n"); vm_pageout_oom(VM_OOM_SWAPZ); pause("swzonxb", 10); } else uma_zwait(swblk_zone); . . . if (uma_zone_exhausted(swpctrie_zone)) { if (atomic_cmpset_int(&swpctrie_zone_exhausted, 0, 1)) printf("swap pctrie zone exhausted, " "increase kern.maxswzone\n"); vm_pageout_oom(VM_OOM_SWAPZ); pause("swzonxp", 10); } else uma_zwait(swpctrie_zone); . . . The above we have not been controlling: uma zone exhaustion for swblk_zone and swpctrie_zone. (Not that I'm familiar with them or the rest of this material.) On a small memory machine, there may be nothing that can be directly done that does not have other, nasty tradeoffs. Of course, there might be reasons that one or both of these exhaust faster then they used to. There are the 2 printf messages, but they are conditional. Still, they give something else to look for in console or log output. One possibility is always having an unconditional printf just before each of the 4 vm_pageout_oom calls, each of which identifies which of the 4 contexts is making the call. That would at least be a start at figuring things out. (swp_pager_meta_build's code means that the argument to vm_pageout_oom is not as specific for such identification.) The vm_pageout_oom(. . .) routine has: . . . if (bigproc != NULL) { if (vm_panic_on_oom != 0) panic("out of swap space"); PROC_LOCK(bigproc); killproc(bigproc, "out of swap space"); sched_nice(bigproc, PRIO_MIN); _PRELE(bigproc); PROC_UNLOCK(bigproc); } . . . That is where the can-be-a-misnomer "out of swap space" is from. Looks like it is correct for some conditions, but not the conditions we have historically got for our contexts. It takes looking at other messages to figure out if it is a misnomer: Another type of message carries the actual out-of-swap information and if that message is not present then the one based on the above is a misnomer. vm_pageout_oom could use its argument to be somewhat more specific for the text it passes to killproc(. . .). For reference: # grep -r "VM_OOM_" /usr/src/sys/ | more /usr/src/sys/vm/vm_fault.c: vm_pageout_oom(VM_OOM_MEM_PF); /usr/src/sys/vm/vm_pageout.c: vm_pageout_oom(VM_OOM_MEM); /usr/src/sys/vm/vm_pageout.c: if (shortage == VM_OOM_MEM_PF && /usr/src/sys/vm/vm_pageout.c: if (shortage == VM_OOM_MEM || shortage == VM_OOM_MEM_PF) /usr/src/sys/vm/swap_pager.c: vm_pageout_oom(VM_OOM_SWAPZ); /usr/src/sys/vm/swap_pager.c: vm_pageout_oom(VM_OOM_SWAPZ); /usr/src/sys/vm/vm_pageout.h:#define VM_OOM_MEM 1 /usr/src/sys/vm/vm_pageout.h:#define VM_OOM_MEM_PF 2 /usr/src/sys/vm/vm_pageout.h:#define VM_OOM_SWAPZ 3 === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)Received on Tue Jan 28 2020 - 04:30:01 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC