On 2020-Jan-28, at 11:02, bob prohaska <fbsd at www.zefox.net> wrote: > On Tue, Jan 28, 2020 at 09:42:17AM -0800, Mark Millard wrote: >> >> >> > The (partly)modified kernel compiled and booted without > obvious trouble. It's trying to finish buildworld now. > >> If you are testing with vm.pfault_oom_attempts="-1" then >> the vm_fault printf message should never happen anyway. >> > Would it not be interesting if the message appeared in that > case? Thanks for the question: looking at the new code found a bug causing oom where it used to be avoided in head -r357025 and before. After vm_waitpfault(dset, vm_pfault_oom_wait * hz) the -r357026 code does a vm_pageout_oom(VM_OOM_MEM_PF) no matter what, even when vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts : New code in head -r357026 ( nothing to avoid the vm_pageout_oom(VM_OOM_MEM_PF) for vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts ): if (fs->m == NULL) { unlock_and_deallocate(fs); if (vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts) { fs->oom++; vm_waitpfault(dset, vm_pfault_oom_wait * hz); } if (bootverbose) printf( "proc %d (%s) failed to alloc page on fault, starting OOM\n", curproc->p_pid, curproc->p_comm); vm_pageout_oom(VM_OOM_MEM_PF); return (KERN_RESOURCE_SHORTAGE); } Old code in head -r357025 ( has the goto RetryFault_oom after vm_waitpfault(. . .), thereby avoiding the vm_pageout_oom(VM_OOM_MEM_PF) for vm_pfault_oom_attempts < 0 || fs->oom < vm_pfault_oom_attempts ) : if (fs.m == NULL) { unlock_and_deallocate(&fs); if (vm_pfault_oom_attempts < 0 || oom < vm_pfault_oom_attempts) { oom++; vm_waitpfault(dset, vm_pfault_oom_wait * hz); goto RetryFault_oom; } if (bootverbose) printf( "proc %d (%s) failed to alloc page on fault, starting OOM\n", curproc->p_pid, curproc->p_comm); vm_pageout_oom(VM_OOM_MEM_PF); goto RetryFault; } I expect this is the source of the behavioral difference folks have been seeing for OOM kills. As for "gather evidence" messages . . . >> You may be able to just look and manually delete or >> comment out the bootverbose line in the more modern >> source that currently looks like: >> >> if (bootverbose) >> printf( >> "proc %d (%s) failed to alloc page on fault, starting OOM\n", >> curproc->p_pid, curproc->p_comm); >> vm_pageout_oom(VM_OOM_MEM_PF); >> return (KERN_RESOURCE_SHORTAGE); >> > > I can find those lines in /usr/src/sys/vm/vm_fault.c, but > unclear on the motivation to comment the lines out. Perhaps > to eliminate the return(...) ? Anyway, is it sufficient > to insert /* before and */ after? The only line to delete or comment out in that code block is: if (bootverbose) Disabling that line makes the following printf always happen, even when a verbose boot was not done. Based on the above reported code change, having a message before vm_pageout_oom(VM_OOM_MEM_PF) is important to getting a report of the kill being via that code. >> and is now in vm_fault_allocate(. . .). (That file has >> hd a reorganization since where I'm synchronized.) >> >> Having the message indicate vm_fault_allocate is >> optional but would look like: >> >> "vm_fault_allocate: proc %d (%s) failed to alloc page on fault, starting OOM\n", >> >> Doing the delete/comment-out would avoid waiting for me. >> >> > I'll do it after the next stoppage. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)Received on Tue Jan 28 2020 - 18:28:23 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC