Re: Official request: Please make GNU grep the default

From: Alan Cox <alan.l.cox_at_gmail.com> Date: Tue, 17 Aug 2010 12:32:52 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC

On Tue, Aug 17, 2010 at 10:45 AM, Kostik Belousov <kostikbel_at_gmail.com>wrote:

> [Cc: list sanitized]
>
> On Tue, Aug 17, 2010 at 05:28:08PM +0200, Dimitry Andric wrote:
> > On 2010-08-16 10:55, Dag-Erling Sm??rgrav wrote:
> > > Dimitry Andric <dimitry_at_andric.com> writes:
> > >> - Uses plain file descriptors instead of struct FILE, since the
> > >>   buffering is done manually anyway, and it makes it easier to support
> > >>   gzip and bzip2.
> > > It might be worth a shot adding mmap(2) support as well, i.e. when
> > > processing an uncompressed regular file, try to mmap(2) it first, and
> if
> > > that fails, fall back to the plain buffered read(2) method.
> >
> > I added a simple mmap to grep, and time-trialed it, but the mmap version
> > was somewhat slower than the regular version.  I understood from Kostik
> > Belousov that readahead does not work properly with mmap, and it should
> > not be used for "one-time" reads.
> This is not exactly what I said. I argue that read-ahead implemented
> by vm_faul() is much less efficient that buffer clustering. Also,
> the cost of setting user mapping for the one time read is also non-trivial.
> The conclusion is right, it is better to use read(2) for one-time read.
>

The mapping (and unmapping) costs should be relatively small if the contents
of the file can be prefaulted using 2/4MB pages.  In such cases, we still
touch every struct vm_page in the 2/4MB region, but we only create and
destroy one PTE and PV entry, and perform a single INVLPG.

Alan