Hi. Recently WD released first series of ATA disks with increased physical sector size. It makes writes not matching with 4K blocks inefficient there. So I propose to get back to the question of optimal FS block alignment. This topic is also important for most of RAIDs having striped nature, such as RAID0/3/5/... and flash drives with simple controller (such as MMC/SD cards). As I have no one of those WD disks yet, I have made series of tests with RAID0, made by geom_stripe, to check general idea. I've tested the most describing case: 2-disk RAID0 with 16K stripe, 16K FS block and many 16K random I/Os (reads in this test, to avoid FS locking). Same load pattern but with writes I had on my busy disk-bound MySQL servers, so it is quite real. Test one, default partitioning. %gstripe label -s 16384 data /dev/ada1 /dev/ada2 %fdisk -I /dev/stripe/data %disklabel -w /dev/stripe/datas1 %disklabel /dev/stripe/datas1 # /dev/stripe/datas1: 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 1250274611 16 unused 0 0 c: 1250274627 0 unused 0 0 # "raw" part, don't edit %diskinfo -v /dev/stripe/datas1a /dev/stripe/datas1a 512 # sectorsize 640140600832 # mediasize in bytes (596G) 1250274611 # mediasize in sectors 16384 # stripesize 7680 # stripeoffset 77825 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. As you can see, fdisk aligned partition to the "track length" of 63 sectors and disklabel added offset of 16 sectors. As result, file system will start at quite odd place of the RAID stripe. I've created UFS file system, pre-wrote 4GB file and run tests (raidtest was patched to generate only 16K requests): %raidtest test -d /mnt/qqq -n 1 Requests per second: 112 %raidtest test -d /mnt/qqq -n 64 Requests per second: 314 Before each test FS was unmounted to flush caches. Test two, FS manually aligned with disklabel. %disklabel /dev/stripe/datas1 # /dev/stripe/datas1: 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 1250274578 33 unused 0 0 c: 1250274627 0 unused 0 0 # "raw" part, don't edit %diskinfo -v /dev/stripe/datas1a /dev/stripe/datas1a 512 # sectorsize 640140583936 # mediasize in bytes (596G) 1250274578 # mediasize in sectors 16384 # stripesize 0 # stripeoffset 77825 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. File system aligned with stripe. %raidtest test -d /mnt/qqq -n 1 Requests per second: 133 %raidtest test -d /mnt/qqq -n 64 Requests per second: 594 The difference is quite significant. Unaligned RAID0 access causes two disks involved in it's handling, while aligned one leaves one of disks free for another request, doubling performance. As we have now mechanism for reporting stripe size and offset for any partition to user-level, it should be easy to make disk partitioning and file system creation tools to use it automatically. Stripe size/offset reporting now supported by ada and mmcsd disk drivers and most of GEOM modules. It would be nice to fetch that info from hardware RAIDs also, where possible. -- Alexander MotinReceived on Fri Dec 25 2009 - 09:58:50 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:59 UTC