Hi ZFS users, for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 130 106 4134 4.5 23 1033 5.2 48.8| ada0 0 131 111 3784 4.2 19 1007 4.0 47.6| ada1 0 90 66 2219 4.5 24 1031 5.1 31.7| ada2 1 81 58 2007 4.6 22 1023 2.3 28.1| ada3 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 132 104 4036 4.2 27 1129 5.3 45.2| ada0 0 129 103 3679 4.5 26 1115 6.8 47.6| ada1 1 91 61 2133 4.6 30 1129 1.9 29.6| ada2 0 81 56 1985 4.8 24 1102 6.0 29.4| ada3 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 148 108 4084 5.3 39 2511 7.2 55.5| ada0 1 141 104 3693 5.1 36 2505 10.4 54.4| ada1 1 102 62 2112 5.6 39 2508 5.5 35.4| ada2 0 99 60 2064 6.0 39 2483 3.7 36.1| ada3 This goes on for minutes, without a change of roles (I had assumed that other 10 minute samples might show relatively higher load on another subset of the drives, but it's always the first two, which receive some 50% more read requests than the other two. The test consisted of minidlna rebuilding its content database for a media collection held on that pool. The unbalanced distribution of requests does not depend on the particular application and the distribution of requests does not change when the drives with highest load approach 100% busy. This is a -CURRENT built from yesterdays sources, but the problem exists for quite some time (and should definitely be reproducible on -STABLE, too). The pool consists of a 4 drive raidz1 on an ICH10 (H67) without cache or log devices and without much ZFS tuning (only max. ARC size, should not at all be relevant in this context): zpool status -v pool: raid1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM raid1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 errors: No known data errors Cached configuration: version: 28 name: 'raid1' state: 0 txg: 153899 pool_guid: 10507751750437208608 hostid: 3558706393 hostname: 'se.local' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 10507751750437208608 children[0]: type: 'raidz' id: 0 guid: 7821125965293497372 nparity: 1 metaslab_array: 30 metaslab_shift: 36 ashift: 12 asize: 7301425528832 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 7487684108701568404 path: '/dev/ada0p2' phys_path: '/dev/ada0p2' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 12000329414109214882 path: '/dev/ada1p2' phys_path: '/dev/ada1p2' whole_disk: 1 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 2926246868795008014 path: '/dev/ada2p2' phys_path: '/dev/ada2p2' whole_disk: 1 create_txg: 4 children[3]: type: 'disk' id: 3 guid: 5226543136138409733 path: '/dev/ada3p2' phys_path: '/dev/ada3p2' whole_disk: 1 create_txg: 4 I'd be interested to know, whether this behavior can be reproduced on other systems with raidz1 pools consisting of 4 or more drives. All it takes is generating some disk load and running the command: gstat -I 10000000 -f '^a?da?.$' to obtain 10 second averages. I have not even tried to look at the scheduling of requests in ZFS, but I'm surprised to see higher than average load on just 2 of the 4 drives, since RAID parity should be evenly spread over all drives and for each file system block a different subset of 3 out of 4 drives should be able to deliver the data without need to reconstruct it from parity (that would lead to an even distribution of load). I've got two theories what might cause the obtained behavior: 1) There is some meta data that is only kept on the first two drives. Data is evenly spread, but meta data accesses lead to additional reads. 2) The read requests are distributed in such a way, that 1/3 goes to ada0, another 1/3 to ada1, while the remaining 1/3 is evenly distributed to ada2 and ada3. So: Can anybody reproduce this distribution requests? Any idea, why this is happening and whether something should be changed in ZFS to better distribute the load (leading to higher file system performance)? Best regards, STefanReceived on Mon Dec 19 2011 - 13:22:11 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC