Re: Official request: Please make GNU grep the default

From: Dimitry Andric <dimitry_at_andric.com> Date: Tue, 17 Aug 2010 17:28:08 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC

On 2010-08-16 10:55, Dag-Erling Smørgrav wrote:
> Dimitry Andric <dimitry_at_andric.com> writes:
>> - Uses plain file descriptors instead of struct FILE, since the
>>   buffering is done manually anyway, and it makes it easier to support
>>   gzip and bzip2.
> It might be worth a shot adding mmap(2) support as well, i.e. when
> processing an uncompressed regular file, try to mmap(2) it first, and if
> that fails, fall back to the plain buffered read(2) method.

I added a simple mmap to grep, and time-trialed it, but the mmap version
was somewhat slower than the regular version.  I understood from Kostik
Belousov that readahead does not work properly with mmap, and it should
not be used for "one-time" reads.

I also experimented with different buffer sizes on the same big test
file, and this gives the following results (times in s):

buffer size     test1   test2   test3   average
===========     ===     ===     ===     ===
        512     467     484     465     472
      1,024     391     415     392     399
      2,048     361     356     365     361
      4,096     353     353     356     354
      8,192     348     345     357     350
     16,384     341     373     350     354
     32,768     339     348     346     344
     65,536     336     359     371     355
    262,144     334     352     350     345
  1,048,576     334     350     351     345
  2,097,152     339     342     369     350
373,293,056     544     547     559     550

E.g. the 32k buffer size that I borrowed from GNU grep seems to be
reasonable enough.  There is no profit in wasting huge amounts of memory
to speed things up.