Re: Uneven load on drives in ZFS RAIDZ1

From: Peter Maloney <peter.maloney_at_brockmann-consult.de>
Date: Mon, 19 Dec 2011 16:42:20 +0100
On 12/19/2011 03:22 PM, Stefan Esser wrote:
> Hi ZFS users,
>
> for quite some time I have observed an uneven distribution of load
> between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of
> a longer log of 10 second averages logged with gstat:
>
> dT: 10.001s  w: 10.000s  filter: ^a?da?.$
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     0    130    106   4134    4.5     23   1033    5.2   48.8| ada0
>     0    131    111   3784    4.2     19   1007    4.0   47.6| ada1
>     0     90     66   2219    4.5     24   1031    5.1   31.7| ada2
>     1     81     58   2007    4.6     22   1023    2.3   28.1| ada3
>
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     1    132    104   4036    4.2     27   1129    5.3   45.2| ada0
>     0    129    103   3679    4.5     26   1115    6.8   47.6| ada1
>     1     91     61   2133    4.6     30   1129    1.9   29.6| ada2
>     0     81     56   1985    4.8     24   1102    6.0   29.4| ada3
>
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     1    148    108   4084    5.3     39   2511    7.2   55.5| ada0
>     1    141    104   3693    5.1     36   2505   10.4   54.4| ada1
>     1    102     62   2112    5.6     39   2508    5.5   35.4| ada2
>     0     99     60   2064    6.0     39   2483    3.7   36.1| ada3
>
> ...
> So: Can anybody reproduce this distribution requests?
I don't have a raidz1 machine, and no time to make you a special raidz1
pool out of spare disks, but on my raidz2 I can only ever see unevenness
when a disk is bad, or between different vdevs. But you only have one vdev.

Check is that your disks are identical (are they? we can only assume so
since you didn't say so).
Show us output from:
smartctl -i /dev/ada0
smartctl -i /dev/ada1
smartctl -i /dev/ada2
smartctl -i /dev/ada3

Since your tests show read ms/r to be pretty even, I guess your disks
are not broken. But the ms/w is slightly different. So I think it seems
that the first 2 disks are slower for writing (someone once said that
refurbished disks are like this, even if identical), or the hard disk
controller ports they use are slower. For example, maybe your
motherboard has 6 ports, and you plugged disks 1,2,3 into port 1,2,3 and
disk 4 into port 5. Disk 3 and 4 would have their own channel, but disk
1 and 2 share one.

So if the disks are identical, I would guess your hard disk controller
is to blame. To test this, first back it up. Then *fix your setup by
using labels*. ie. use gpt/somelabel0 or gptid/....... rather than
ada0p2. Check "ls /dev/gpt*" output for options on what labels you have
already. Then try swapping disks around to see if the load changes. Make
sure to back up...

Swapping disks (or even removing one depending on controller, etc. when
it fails) without labels can be bad.
eg.
You have ada1 ada2 ada3 ada4.
Someone spills coffee on ada2; it fries and cannot be detected anymore,
and you reboot.
Now you have ada1 ada2 ada3.
Then things are usually still fine (even though ada3 is now ada2 and
ada4 is now ada3, because there is some zfs superblock stuff to keep
track of things), but if you also had an ada5 that was not part of the
pool, or was a spare or a log or something other than another disk in
the same vdev as ada1, etc., bad things happen when it becomes ada4.
Unfortunately, I don't know exactly what people do to cause the "bad
things" that happen. When this happened to me, it just said my pool was
faulted or degraded or something, and set a disk or two to UNAVAIL or
FAULTED. I don't remember it automatically resilvering them, but when I
read about these problems, I think it seems like some disks were
resilvered afterwards.


And last thing I can think of is to make sure your partitions are
aligned, and identical. Show us output from:
gpart show



> Any idea, why this is happening and whether something should be changed
> in ZFS to better distribute the load (leading to higher file system
> performance)?
>
> Best regards, STefan
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney_at_brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------
Received on Mon Dec 19 2011 - 14:42:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC