Re: Serious I/O problems (bad performance and live-lock)

From: Ian Dowse <iedowse_at_maths.tcd.ie>
Date: Mon, 25 Apr 2005 10:38:50 +0100
In message <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA0VcX9IoJqUaXPS8MjT1P
dsKAAAAQAAAA5xh4prxQBkmZLv9A9nCvPwEAAAAA_at_telia.com>, Daniel Eriksson writes:
>
>Here are some further observations and speculations.
>
>On a newly booted system, this is what happens:
>
>1. Start a "dd if=/dev/zero of=/usr/test bs=128k".
>2. While looking at 'top', "Inact" grows and "Free" shrinks.
>3. Once "Free" has bottomed out, "Inact" stops growing (naturally).
>4. 'dd' continues to put a load on the VM system, eventually forcing most
>processes to be swapped out (illustrated by the "RES" column showing a very
>low number for all but a few processes). This takes 30-60 seconds after
>"Free" has bottomed out on my machine.
>5. At this point the machine is mostly useless because it can take several
>minutes to run a simple 'ls'.

This may not be directly related, but the disk scheduling algorithm
in bioq_disksort() has always behaved poorly for large sequential
writes. Its keeps deciding to process the next request in the
sequential pattern because the single-direction elevator sort always
prefers offsets that are after the current position over smaller
offsets. The intention is that this results in frequent sweeps
across the whole disk, but with large sequential writes it can get
stuck for long periods of time at one part of the disk.

Below is a patch that offers a bit more control over this behaviour
that I was experimenting with some time ago. I seem to remember
finding that a smaller value for kern.bioq_maxbeforeswitch such as
5 might be a better default. The existing bioq_disksort() behaviour
corresponds to a very large value of this sysctl.

Ian

Index: subr_disk.c
===================================================================
RCS file: /dump/FreeBSD-CVS/src/sys/kern/subr_disk.c,v
retrieving revision 1.83
diff -u -r1.83 subr_disk.c
--- subr_disk.c	6 Jan 2005 23:35:39 -0000	1.83
+++ subr_disk.c	17 Feb 2005 20:53:18 -0000
_at__at_ -14,11 +14,18 _at__at_
 
 #include <sys/param.h>
 #include <sys/systm.h>
+#include <sys/kernel.h>
+#include <sys/sysctl.h>
 #include <sys/bio.h>
 #include <sys/conf.h>
 #include <sys/disk.h>
 #include <geom/geom_disk.h>
 
+int bioq_maxbeforeswitch = 20;
+SYSCTL_INT(_kern, OID_AUTO, bioq_maxbeforeswitch, CTLFLAG_RW,
+    &bioq_maxbeforeswitch, 0,
+    "Maximum number of operations to place before the switch point");
+
 /*-
  * Disk error is the preface to plaintive error messages
  * about failing disk transfers.  It prints messages of the form
_at__at_ -71,6 +78,7 _at__at_
 	head->last_offset = 0;
 	head->insert_point = NULL;
 	head->switch_point = NULL;
+	head->beforeswitchcnt = 0;
 }
 
 void
_at__at_ -85,8 +93,10 _at__at_
 	} else if (bp == TAILQ_FIRST(&head->queue))
 		head->last_offset = bp->bio_offset;
 	TAILQ_REMOVE(&head->queue, bp, bio_queue);
-	if (TAILQ_FIRST(&head->queue) == head->switch_point)
+	if (TAILQ_FIRST(&head->queue) == head->switch_point) {
+		head->beforeswitchcnt = 0;
 		head->switch_point = NULL;
+	}
 }
 
 void
_at__at_ -179,7 +189,8 _at__at_
 		 * "locked" portion of the list, then we must add ourselves
 		 * to the second request list.
 		 */
-		if (bp->bio_offset < bioq->last_offset) {
+		if (bp->bio_offset < bioq->last_offset ||
+		    bioq->beforeswitchcnt > bioq_maxbeforeswitch) {
 
 			bq = bioq->switch_point;
 			/*
_at__at_ -202,6 +213,7 _at__at_
 				return;
 			}
 		} else {
+			bioq->beforeswitchcnt++;
 			if (bioq->switch_point != NULL)
 				be = TAILQ_PREV(bioq->switch_point,
 						bio_queue, bio_queue);
Received on Mon Apr 25 2005 - 07:38:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:32 UTC