On 24/11/2005, at 12:02 PM, Kris Kennaway wrote: > On Thu, Nov 24, 2005 at 11:54:05AM +1100, Sam Lawrance wrote: >> >> On 24/11/2005, at 7:18 AM, Kris Kennaway wrote: >> >>> I have noticed that when multiple identical processes (e.g. gtar, or >>> dd) are run on 4BSD, when there are N CPUs on a machine there >>> will be >>> N processes that run with a higher CPU share than all the >>> others. As >>> a result, these N processes finish first, then another N, and so on. >>> >>> This is true under both 4.11 and 6.0 (so in that sense it's not so >>> surprising), but the effect is much more pronounced on 6.0 (which >>> may >>> be possible to fix). >>> >>> Here are the exit times for 6 identical gtar processes (and same >>> 4.11 >>> gtar binary) started together on a 2-CPU machine: >>> >>> 6.0: >>> >>> 1132776233 >>> 1132776235 >>> 1132776264 >>> 1132776265 >>> 1132776279 >>> 1132776279 >>> 238.86 real 10.87 user 166.00 sys >>> >>> You can see they finish in pairs, and there's a spread of 46 seconds >>> from first to last. >>> >>> On 4.11: >>> >>> 1132775426 >>> 1132775429 >>> 1132775431 >>> 1132775432 >>> 1132775448 >>> 1132775449 >>> 275.56 real 0.43 user 336.26 sys >>> >>> They also finish in pairs, but the spread is half, at 23 seconds. >>> >>> This seems to be correlated to the rate at which the processes >>> perform >>> I/O. On a quad amd64 machine running 6.0 when I run multiple dd >>> processes at different offsets in a md device: >>> >>> 268435456 bytes transferred in 1.734285 secs (154781618 bytes/sec) >>> 268435456 bytes transferred in 1.737857 secs (154463501 bytes/sec) >>> 268435456 bytes transferred in 1.751760 secs (153237575 bytes/sec) >>> 268435456 bytes transferred in 3.263460 secs (82254865 bytes/sec) >>> 268435456 bytes transferred in 3.295294 secs (81460244 bytes/sec) >>> 268435456 bytes transferred in 3.349770 secs (80135487 bytes/sec) >>> 268435456 bytes transferred in 4.716637 secs (56912467 bytes/sec) >>> 268435456 bytes transferred in 4.850927 secs (55336941 bytes/sec) >>> 268435456 bytes transferred in 4.953528 secs (54190760 bytes/sec) >>> >>> They finish in groups of 3 here since the 4th CPU is being used to >>> drive the md worker thread (which takes up most of the CPU). In >>> this >>> case the first 3 dd processes get essentially 100% of the CPU, >>> and the >>> rest get close to 0% until those 3 processes finish. >>> >>> Perhaps this can be tweaked. >>> >> >> I tried this on a dual Xeon, with 12 processes like >> >> mdconfig -a -t swap -s 320m >> dd if=/dev/md0 of=1 bs=1m skip=0 count=40 & >> dd if=/dev/md0 of=2 bs=1m skip=40 count=40 & > > You're reading from the md, not writing to it. Sorry if that wasn't > clear. > > My test is: > > #!/bin/sh > > mdconfig -d -u 0 > mdconfig -a -t swap -s 16g > for i in `jot $1 1`; do > dd if=/dev/zero of=/dev/md0 seek=$(($i*16384)) count=16384 > bs=16k > /dev/null & > done > wait Ah :-) Now I see it, very pronounced.Received on Thu Nov 24 2005 - 00:08:43 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC