In message <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAA0VcX9IoJqUaXPS8MjT1P dsKAAAAQAAAA5xh4prxQBkmZLv9A9nCvPwEAAAAA_at_telia.com>, Daniel Eriksson writes: > >Here are some further observations and speculations. > >On a newly booted system, this is what happens: > >1. Start a "dd if=/dev/zero of=/usr/test bs=128k". >2. While looking at 'top', "Inact" grows and "Free" shrinks. >3. Once "Free" has bottomed out, "Inact" stops growing (naturally). >4. 'dd' continues to put a load on the VM system, eventually forcing most >processes to be swapped out (illustrated by the "RES" column showing a very >low number for all but a few processes). This takes 30-60 seconds after >"Free" has bottomed out on my machine. >5. At this point the machine is mostly useless because it can take several >minutes to run a simple 'ls'. This may not be directly related, but the disk scheduling algorithm in bioq_disksort() has always behaved poorly for large sequential writes. Its keeps deciding to process the next request in the sequential pattern because the single-direction elevator sort always prefers offsets that are after the current position over smaller offsets. The intention is that this results in frequent sweeps across the whole disk, but with large sequential writes it can get stuck for long periods of time at one part of the disk. Below is a patch that offers a bit more control over this behaviour that I was experimenting with some time ago. I seem to remember finding that a smaller value for kern.bioq_maxbeforeswitch such as 5 might be a better default. The existing bioq_disksort() behaviour corresponds to a very large value of this sysctl. Ian Index: subr_disk.c =================================================================== RCS file: /dump/FreeBSD-CVS/src/sys/kern/subr_disk.c,v retrieving revision 1.83 diff -u -r1.83 subr_disk.c --- subr_disk.c 6 Jan 2005 23:35:39 -0000 1.83 +++ subr_disk.c 17 Feb 2005 20:53:18 -0000 _at__at_ -14,11 +14,18 _at__at_ #include <sys/param.h> #include <sys/systm.h> +#include <sys/kernel.h> +#include <sys/sysctl.h> #include <sys/bio.h> #include <sys/conf.h> #include <sys/disk.h> #include <geom/geom_disk.h> +int bioq_maxbeforeswitch = 20; +SYSCTL_INT(_kern, OID_AUTO, bioq_maxbeforeswitch, CTLFLAG_RW, + &bioq_maxbeforeswitch, 0, + "Maximum number of operations to place before the switch point"); + /*- * Disk error is the preface to plaintive error messages * about failing disk transfers. It prints messages of the form _at__at_ -71,6 +78,7 _at__at_ head->last_offset = 0; head->insert_point = NULL; head->switch_point = NULL; + head->beforeswitchcnt = 0; } void _at__at_ -85,8 +93,10 _at__at_ } else if (bp == TAILQ_FIRST(&head->queue)) head->last_offset = bp->bio_offset; TAILQ_REMOVE(&head->queue, bp, bio_queue); - if (TAILQ_FIRST(&head->queue) == head->switch_point) + if (TAILQ_FIRST(&head->queue) == head->switch_point) { + head->beforeswitchcnt = 0; head->switch_point = NULL; + } } void _at__at_ -179,7 +189,8 _at__at_ * "locked" portion of the list, then we must add ourselves * to the second request list. */ - if (bp->bio_offset < bioq->last_offset) { + if (bp->bio_offset < bioq->last_offset || + bioq->beforeswitchcnt > bioq_maxbeforeswitch) { bq = bioq->switch_point; /* _at__at_ -202,6 +213,7 _at__at_ return; } } else { + bioq->beforeswitchcnt++; if (bioq->switch_point != NULL) be = TAILQ_PREV(bioq->switch_point, bio_queue, bio_queue);Received on Mon Apr 25 2005 - 07:38:52 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:32 UTC