Re: [RFC] vfs.read_min proposal

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Thu, 28 Mar 2013 09:52:09 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:36 UTC

On Wed, Mar 27, 2013 at 01:43:32PM -0700, Maksim Yevmenkin wrote:
> Hello,
> 
> i would like to get some reviews, opinions and/or comments on the patch below.
> 
> a little bit background, as far as i understand, cluster_read() can
> initiate two disk i/o's: one for exact amount of data being requested
> (rounded up to a filesystem block size) and another for a configurable
> read ahead. read ahead data are always extra and do not super set data
> being requested. also, read ahead can be controlled via f_seqcount (on
> per descriptor basis) and/or vfs.read_max (global knob).
> 
> in some cases and/or on some work loads it can be beneficial to bundle
> original data and read ahead data in one i/o request. in other words,
> read more than caller has requested, but only perform one larger i/o,
> i.e. super set data being requested and read ahead.

The totread argument to the cluster_read() is supplied by the filesystem
to indicate how many data in the current request is specified. Always
overriding this information means two things:
- you fill the buffer and page cache with potentially unused data.
  For some situations, like partial reads, it would be really bad.
- you increase the latency by forcing the reader to wait for the whole
  cluster which was not asked for.

So it looks as very single- and special-purpose hack. Besides, the
global knob is obscure and probably would not have any use except your
special situation. Would a file flag be acceptable for you ?

What is the difference in the numbers you see, and what numbers ?
Is it targeted for read(2) optimizations, or are you also concerned
with the read-ahead done at the fault time ?

> 
> ===
> 
> Index: trunk/cache/src/sys/kern/vfs_cluster.c
> ===================================================================
> diff -u -N -r515 -r1888
> --- trunk/cache/src/sys/kern/vfs_cluster.c	(.../vfs_cluster.c)	(revision 515)
> +++ trunk/cache/src/sys/kern/vfs_cluster.c	(.../vfs_cluster.c)	(revision 1888)
> _at__at_ -75,6 +75,10 _at__at_
>  SYSCTL_INT(_vfs, OID_AUTO, read_max, CTLFLAG_RW, &read_max, 0,
>      "Cluster read-ahead max block count");
> 
> +static int read_min = 1;
> +SYSCTL_INT(_vfs, OID_AUTO, read_min, CTLFLAG_RW, &read_min, 0,
> +    "Cluster read min block count");
> +
>  /* Page expended to mark partially backed buffers */
>  extern vm_page_t	bogus_page;
> 
> _at__at_ -169,13 +173,21 _at__at_
>  	} else {
>  		off_t firstread = bp->b_offset;
>  		int nblks;
> +		long minread;
> 
>  		KASSERT(bp->b_offset != NOOFFSET,
>  		    ("cluster_read: no buffer offset"));
> 
>  		ncontig = 0;
> 
>  		/*
> +		 * Adjust totread if needed
> +		 */
> +		minread = read_min * size;
> +		if (minread > totread)
> +			totread = minread;
> +
> +		/*
>  		 * Compute the total number of blocks that we should read
>  		 * synchronously.
>  		 */
> 
> ===
> 
> thanks,
> max
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"