Re: OOMA kill with vm.pfault_oom_attempts="-1" on RPi3 at r357147

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 28 Jan 2020 00:49:41 -0800
On 2020-Jan-27, at 21:29, Mark Millard <marklmi at yahoo.com> wrote:

> On 2020-Jan-27, at 19:53, bob prohaska <fbsd at www.zefox.net> wrote:
> 
>> On Mon, Jan 27, 2020 at 06:22:20PM -0800, Mark Millard wrote:
>>> 
>>> So far as I know, in the past progress was only made when someone
>>> already knowledgable got involved in isolating what was happening
>>> and how to control it.
>>> 
>> Indeed. One can only hope said knowledgeables are reading....
> 
> May be I can suggest something that might kick-start
> evidence gathering a little bit: add 4 unconditional
> printf's to the kernel code, each just before one of
> the vm_pageout_oom(. . .) calls. Have the message
> uniquely identify which of the 4 it is before.
> 
> . . .

Below is a stab at implementing the suggestion. A couple of
the printf's are basically what Mark Johnston supplied
long ago. (Other code from what he supplied back then did
not survive updates made to FreeBSD.) One of his printf's
is not tied to indicating vm_pageout_oom use.

(Sent this way some whitespace might not be preserved.)

# svnlite diff /usr/src/sys/vm/ 
Index: /usr/src/sys/vm/swap_pager.c
===================================================================
--- /usr/src/sys/vm/swap_pager.c	(revision 356426)
+++ /usr/src/sys/vm/swap_pager.c	(working copy)
_at__at_ -2021,6 +2021,7 _at__at_
 				    0, 1))
 					printf("swap blk zone exhausted, "
 					    "increase kern.maxswzone\n");
+				printf("swp_pager_meta_build: swap blk uma zone exhausted\n");
 				vm_pageout_oom(VM_OOM_SWAPZ);
 				pause("swzonxb", 10);
 			} else
_at__at_ -2051,6 +2052,7 _at__at_
 				    0, 1))
 					printf("swap pctrie zone exhausted, "
 					    "increase kern.maxswzone\n");
+				printf("swp_pager_meta_build: swap pctrie uma zone exhausted\n");
 				vm_pageout_oom(VM_OOM_SWAPZ);
 				pause("swzonxp", 10);
 			} else
Index: /usr/src/sys/vm/vm_fault.c
===================================================================
--- /usr/src/sys/vm/vm_fault.c	(revision 356426)
+++ /usr/src/sys/vm/vm_fault.c	(working copy)
_at__at_ -943,9 +943,9 _at__at_
 					    vm_pfault_oom_wait * hz);
 					goto RetryFault_oom;
 				}
-				if (bootverbose)
+				// HAVE PRINTF BE UNCONDITIONAL FOR TESTING: if (bootverbose)
 					printf(
-	"proc %d (%s) failed to alloc page on fault, starting OOM\n",
+	"vm_fault: proc %d (%s) failed to alloc page on fault, starting OOM\n",
 					    curproc->p_pid, curproc->p_comm);
 				vm_pageout_oom(VM_OOM_MEM_PF);
 				goto RetryFault;
Index: /usr/src/sys/vm/vm_page.c
===================================================================
--- /usr/src/sys/vm/vm_page.c	(revision 356426)
+++ /usr/src/sys/vm/vm_page.c	(working copy)
_at__at_ -3139,6 +3139,7 _at__at_
 	 * race-free vm_wait_domain().
 	 */
 	if (curproc == pageproc) {
+		printf("thread %d waiting for memory\n", curthread->td_tid);
 		mtx_lock(&vm_domainset_lock);
 		vm_pageproc_waiters++;
 		msleep(&vm_pageproc_waiters, &vm_domainset_lock, PVM | PDROP,
Index: /usr/src/sys/vm/vm_pageout.c
===================================================================
--- /usr/src/sys/vm/vm_pageout.c	(revision 356426)
+++ /usr/src/sys/vm/vm_pageout.c	(working copy)
_at__at_ -1741,6 +1741,8 _at__at_
 	 * start OOM.  Initiate the selection and signaling of the
 	 * victim.
 	 */
+	printf("vm_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u\n",
+	    vmd->vmd_free_count, vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
 	vm_pageout_oom(VM_OOM_MEM);
 
 	/*
_at__at_ -1933,10 +1935,24 _at__at_
 	}
 	sx_sunlock(&allproc_lock);
 	if (bigproc != NULL) {
+		char *reason_text;
+		switch (shortage) {
+		case VM_OOM_MEM_PF:
+			reason_text= "fault's page allocation failed";
+			break;
+		case VM_OOM_MEM:
+			reason_text= "free RAM stayed below threshold";
+			break;
+		case VM_OOM_SWAPZ:
+			reason_text= "swblk or swpctrie zone exhausted";
+			break;
+		default:
+			reason_text= "Unknown Reason";
+		}
 		if (vm_panic_on_oom != 0)
-			panic("out of swap space");
+			panic("%s",reason_text);
 		PROC_LOCK(bigproc);
-		killproc(bigproc, "out of swap space");
+		killproc(bigproc, reason_text);
 		sched_nice(bigproc, PRIO_MIN);
 		_PRELE(bigproc);
 		PROC_UNLOCK(bigproc);



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Received on Tue Jan 28 2020 - 07:49:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC