On Sat, Dec 10, 2011 at 05:42:01PM -0800, Maksim Yevmenkin wrote: > Hello, > > i have a question for fs wizards. > > suppose i can persuade modern spinning disk to do "large" reads (say > 512K to 1M) at a time. also, suppose file system on such modern > spinning drive is used to store large files (tens to hundreds of > megabytes). is there any way i can tweak the file system parameters > (block size, layout, etc) to help it to get as close to "disk's > sequential read rate" as possible. I understand that i will not be > able to get 100MB/sec single client sequential read rate, but, can i > get it into sustained 40-50MB/sec rate? also, can i reduce performance > impact caused by "small reads" such as directory access etc. If you wanted to get responses from experts only, sorry in advance. The fs (AKA UFS) uses clustering provided by the block cache. The clustering code, mainly located in the kern/vfs_cluster.c, coalesces sequence of reads or writes that are targeting the consequtive blocks, into single physical read or write of the maximal size of MAXPHYS. Current definition of MAXPHYS is 128KB. Clustering allows filesystem to improve the layout of the files by calling VOP_REALLOCBLKS() to redo the allocation to make the writing sequence of blocks sequential if it is not. Even if file is not layed out ideally, or the i/o pattern is random, most writes scheduled are asynchronous, and for reads, the system tries to schedule read-aheads for some limited number of blocks. This allows the lower layers, i.e. geom and disk drivers, to optimize the i/o queue to coalesce requests that are consequitive on disk, but not on the queue. BTW, some time ago I was interested in the effect on the fragmentation on UFS, due to some semi-abandoned patch, which could make the fragmentation worse. I wrote the tool that calculated the percentage of non-consequtive spots in the whole filesystem. Apparently, even under the hard load consisting of writing a lot of files under the megabytes in size, UFS managed to keep the number of spots under 2-3% on sufficiently free volume.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC