Re: calling all fs experts

From: Kostik Belousov <kostikbel_at_gmail.com> Date: Sun, 11 Dec 2011 18:26:40 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC

On Sat, Dec 10, 2011 at 05:42:01PM -0800, Maksim Yevmenkin wrote:
> Hello,
> 
> i have a question for fs wizards.
> 
> suppose i can persuade modern spinning disk to do "large" reads (say
> 512K to 1M) at a time. also, suppose file system on such modern
> spinning drive is used to store large files (tens to hundreds of
> megabytes). is there any way i can tweak the file system parameters
> (block size, layout, etc) to help it to get as close to "disk's
> sequential read rate" as possible. I understand that i will not be
> able to get 100MB/sec single client sequential read rate, but, can i
> get it into sustained 40-50MB/sec rate? also, can i reduce performance
> impact caused by "small reads" such as directory access etc.

If you wanted to get responses from experts only, sorry in advance.

The fs (AKA UFS) uses clustering provided by the block cache. The clustering
code, mainly located in the kern/vfs_cluster.c, coalesces sequence of
reads or writes that are targeting the consequtive blocks, into single
physical read or write of the maximal size of MAXPHYS. Current definition
of MAXPHYS is 128KB.

Clustering allows filesystem to improve the layout of the files by calling
VOP_REALLOCBLKS() to redo the allocation to make the writing sequence of
blocks sequential if it is not.

Even if file is not layed out ideally, or the i/o pattern is random, most
writes scheduled are asynchronous, and for reads, the system tries to
schedule read-aheads for some limited number of blocks. This allows the
lower layers, i.e. geom and disk drivers, to optimize the i/o queue
to coalesce requests that are consequitive on disk, but not on the queue.

BTW, some time ago I was interested in the effect on the fragmentation
on UFS, due to some semi-abandoned patch, which could make the
fragmentation worse. I wrote the tool that calculated the percentage
of non-consequtive spots in the whole filesystem. Apparently, even
under the hard load consisting of writing a lot of files under the
megabytes in size, UFS managed to keep the number of spots under 2-3% on
sufficiently free volume.