Re: Uneven load on drives in ZFS RAIDZ1

From: Stefan Esser <se_at_freebsd.org> Date: Mon, 19 Dec 2011 21:54:10 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC

Am 19.12.2011 18:05, schrieb Garrett Cooper:
> On Mon, Dec 19, 2011 at 6:22 AM, Stefan Esser <se_at_freebsd.org> wrote:
>> Hi ZFS users,
>>
>> for quite some time I have observed an uneven distribution of load
>> between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of
>> a longer log of 10 second averages logged with gstat:
>>
>> dT: 10.001s  w: 10.000s  filter: ^a?da?.$
>>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>    0    130    106   4134    4.5     23   1033    5.2   48.8| ada0
>>    0    131    111   3784    4.2     19   1007    4.0   47.6| ada1
>>    0     90     66   2219    4.5     24   1031    5.1   31.7| ada2
>>    1     81     58   2007    4.6     22   1023    2.3   28.1| ada3
>>
>>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>    1    132    104   4036    4.2     27   1129    5.3   45.2| ada0
>>    0    129    103   3679    4.5     26   1115    6.8   47.6| ada1
>>    1     91     61   2133    4.6     30   1129    1.9   29.6| ada2
>>    0     81     56   1985    4.8     24   1102    6.0   29.4| ada3
>>
>>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>    1    148    108   4084    5.3     39   2511    7.2   55.5| ada0
>>    1    141    104   3693    5.1     36   2505   10.4   54.4| ada1
>>    1    102     62   2112    5.6     39   2508    5.5   35.4| ada2
>>    0     99     60   2064    6.0     39   2483    3.7   36.1| ada3
> 
> This suggests (note that I said suggests) that there might be a slight
> difference in the data path speeds or physical media as someone else
> suggested; look at zpool iostat -v <interval> though before making a
> firm statement as to whether or not a drive is truly not performing to
> your assumed spec. gstat and zpool iostat -v suggest performance
> though -- they aren't the end-all-be-all for determining drive
> performance.

I doubt there is a difference in the data path speeds, since all drives
are connected to the SATA II ports of an Intel H67 chip.

The drives seem to perform equally well, just with a ratio of read
requests of 30% / 30% / 20% / 20% for ada0 .. ada3. But neither queue
length nor command latencies indicate a problem or differences in the
drives. It seems that a different number of commands is scheduled for 2
of the 4 drives, compared to the other 2, and that scheduling should be
part of the ZFS code. I'm quite convinced, that neither the drives nor
the other hardware plays a role, but I'll follow the suggestion to swap
drives between controller ports and to observe whether the increased
read load moves with the drives (indicating something on disk causes the
anomaly) or stays with the SATA ports (indicating that lower numbered
ports see higher load).

> If the latency numbers were high enough, I would suggest dd'ing out to
> the individual drives (i.e. remove the drive from the RAIDZ) to see if
> there's a noticeable discrepancy, as this can indicate a bad cable,
> backplane, or drive; from there I would start doing the physical swap
> routine and see if the issue moves with the drive or stays static with
> the controller channel and/or chassis slot.

I do not expect a hardware problem, since command latencies are very
similar over all drives, despite the higher read load on some of them.
These are more busy by exactly the factor to be expected by only the
higher command rate.

But it seems that others do not observe the asymmetric distribution of
requests, which makes me wonder whether I happen to have meta data
arranged in such a way that it is always read from ada0 or ada1, but not
(or rarely) from ada2 or ada3. That could explain it, including the fact
that raidz1 over other numbers of drives 8e.g. 3 or 6) apparently show a
much more symmetric distribution of read requests.

Regards, STefan