2011/12/19 Stefan Esser <se_at_freebsd.org>: > Hi ZFS users, > > for quite some time I have observed an uneven distribution of load > between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of > a longer log of 10 second averages logged with gstat: > > dT: 10.001s w: 10.000s filter: ^a?da?.$ > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 130 106 4134 4.5 23 1033 5.2 48.8| ada0 > 0 131 111 3784 4.2 19 1007 4.0 47.6| ada1 > 0 90 66 2219 4.5 24 1031 5.1 31.7| ada2 > 1 81 58 2007 4.6 22 1023 2.3 28.1| ada3 > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 1 132 104 4036 4.2 27 1129 5.3 45.2| ada0 > 0 129 103 3679 4.5 26 1115 6.8 47.6| ada1 > 1 91 61 2133 4.6 30 1129 1.9 29.6| ada2 > 0 81 56 1985 4.8 24 1102 6.0 29.4| ada3 > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 1 148 108 4084 5.3 39 2511 7.2 55.5| ada0 > 1 141 104 3693 5.1 36 2505 10.4 54.4| ada1 > 1 102 62 2112 5.6 39 2508 5.5 35.4| ada2 > 0 99 60 2064 6.0 39 2483 3.7 36.1| ada3 > > This goes on for minutes, without a change of roles (I had assumed that > other 10 minute samples might show relatively higher load on another > subset of the drives, but it's always the first two, which receive some > 50% more read requests than the other two. > > The test consisted of minidlna rebuilding its content database for a > media collection held on that pool. The unbalanced distribution of > requests does not depend on the particular application and the > distribution of requests does not change when the drives with highest > load approach 100% busy. > > This is a -CURRENT built from yesterdays sources, but the problem exists > for quite some time (and should definitely be reproducible on -STABLE, too). > > The pool consists of a 4 drive raidz1 on an ICH10 (H67) without cache or > log devices and without much ZFS tuning (only max. ARC size, should not > at all be relevant in this context): > > zpool status -v > pool: raid1 > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > raid1 ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > ada0p2 ONLINE 0 0 0 > ada1p2 ONLINE 0 0 0 > ada2p2 ONLINE 0 0 0 > ada3p2 ONLINE 0 0 0 > > errors: No known data errors > > Cached configuration: > version: 28 > name: 'raid1' > state: 0 > txg: 153899 > pool_guid: 10507751750437208608 > hostid: 3558706393 > hostname: 'se.local' > vdev_children: 1 > vdev_tree: > type: 'root' > id: 0 > guid: 10507751750437208608 > children[0]: > type: 'raidz' > id: 0 > guid: 7821125965293497372 > nparity: 1 > metaslab_array: 30 > metaslab_shift: 36 > ashift: 12 > asize: 7301425528832 > is_log: 0 > create_txg: 4 > children[0]: > type: 'disk' > id: 0 > guid: 7487684108701568404 > path: '/dev/ada0p2' > phys_path: '/dev/ada0p2' > whole_disk: 1 > create_txg: 4 > children[1]: > type: 'disk' > id: 1 > guid: 12000329414109214882 > path: '/dev/ada1p2' > phys_path: '/dev/ada1p2' > whole_disk: 1 > create_txg: 4 > children[2]: > type: 'disk' > id: 2 > guid: 2926246868795008014 > path: '/dev/ada2p2' > phys_path: '/dev/ada2p2' > whole_disk: 1 > create_txg: 4 > children[3]: > type: 'disk' > id: 3 > guid: 5226543136138409733 > path: '/dev/ada3p2' > phys_path: '/dev/ada3p2' > whole_disk: 1 > create_txg: 4 > > I'd be interested to know, whether this behavior can be reproduced on > other systems with raidz1 pools consisting of 4 or more drives. All it > takes is generating some disk load and running the command: > > gstat -I 10000000 -f '^a?da?.$' > > to obtain 10 second averages. > > I have not even tried to look at the scheduling of requests in ZFS, but > I'm surprised to see higher than average load on just 2 of the 4 drives, > since RAID parity should be evenly spread over all drives and for each > file system block a different subset of 3 out of 4 drives should be able > to deliver the data without need to reconstruct it from parity (that > would lead to an even distribution of load). > > I've got two theories what might cause the obtained behavior: > > 1) There is some meta data that is only kept on the first two drives. > Data is evenly spread, but meta data accesses lead to additional reads. > > 2) The read requests are distributed in such a way, that 1/3 goes to > ada0, another 1/3 to ada1, while the remaining 1/3 is evenly distributed > to ada2 and ada3. > > > So: Can anybody reproduce this distribution requests? Hello, Stupid question, but are your drives all exactly the same ? I noticed "ashift: 12" so I think you should have at least one 4k-sector drive, are you sure they're not mixed with 512B per sector drives ? > > Any idea, why this is happening and whether something should be changed > in ZFS to better distribute the load (leading to higher file system > performance)? > > Best regards, STefan > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier_at_gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas."Received on Mon Dec 19 2011 - 14:02:13 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC