Re: Official request: Please make GNU grep the default

From: Dimitry Andric <dimitry_at_andric.com> Date: Sun, 15 Aug 2010 21:49:44 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC

On 2010-08-15 21:07, Dag-Erling Smørgrav wrote:
> I built a profiling version of BSD grep and ran it with a regexp that
> matches only the very last line in (my copy of) INDEX-9.  The results
> are pretty clear:

[I also did precisely that, and I just read your mail when I was
composing the following... :) ]

I had a look at Doug Barton's time trial script for GNU and BSD grep,
and decided to profile both greps a bit.  This gave some interesting
results.

I started with GNU grep, doing approximately the same search as in
Doug's trial, e.g. grep -m1 "|/usr/ports/x11-wm/xfce4-wm|", but I made
the testfile 373491 lines instead of the original 21971, with the
only matching line as the last one.

The GNU grep top called functions are (please ignore the .mcount entry,
which is a gprof side effect):

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 99.1       0.59     0.59    11497     0.05     0.05  read [5]
  0.6       0.59     0.00    11497     0.00     0.00  kwsexec [8]
  0.1       0.59     0.00        0  100.00%           .mcount (130)
  0.1       0.59     0.00        1     0.62   594.77  grepfile [3]
  0.1       0.60     0.00    11496     0.00     0.00  memmove [9]
  0.0       0.60     0.00        4     0.03     0.03  memchr [10]
  0.0       0.60     0.00    12541     0.00     0.00  memset [16]
  0.0       0.60     0.00    11497     0.00     0.00  EGexecute [7]
  0.0       0.60     0.00    11497     0.00     0.05  fillbuf [4]
  0.0       0.60     0.00    11497     0.00     0.00  grepbuf [6]

The same exercise with BSD grep indeed runs *much* slower, and its top
called functions are:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 67.3       8.60     8.60        0  100.00%           .mcount (105)
 22.2      11.44     2.84    22778     0.12     0.12  __sys_read [9]
  3.6      11.91     0.46 373553797     0.00     0.00  grep_feof [11]
  3.5      12.35     0.45 373180306     0.00     0.00  fgetc [6]
  2.1      12.63     0.27 373180306     0.00     0.00  grep_fgetc [5]
  1.2      12.78     0.15   373491     0.00     0.01  grep_fgetln [4]
  0.0      12.78     0.01      464     0.01     0.01  memset [12]
  0.0      12.79     0.00        1     2.83  4184.69  procfile [3]
  0.0      12.79     0.00        5     0.42     0.69  free [14]
  0.0      12.79     0.00    22778     0.00     0.00  __sread [607]

The testfile has 373180306 bytes, so you can see there is a tremendous
overhead in calling grep_fgetc, grep_feof and fgetc, which is done
separately for EACH byte in the file!

So my first quick fix attempt was to replace the home-grown grep_fgetln
with fgetln(3), which is in libc.  This does not support gzip and bzip2
files, but just to prove the point, it is enough.  It gave the following
profiling result:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 80.0       0.68     0.68     5695     0.12     0.12  read [5]
 15.5       0.81     0.13   379178     0.00     0.00  memchr [8]
  3.2       0.83     0.03        0  100.00%           .mcount (102)
  0.6       0.84     0.00      451     0.01     0.01  memset [9]
  0.4       0.84     0.00        5     0.59     0.85  free [10]
  0.1       0.84     0.00        1     1.23   814.47  procfile [3]
  0.1       0.84     0.00   373491     0.00     0.00  fgetln [4]
  0.0       0.84     0.00   373491     0.00     0.00  grep_fgetln [29]
  0.0       0.84     0.00    11379     0.00     0.00  memcpy [30]
  0.0       0.84     0.00       50     0.00     0.00  arena_run_alloc [36]

You can see there is much less overhead, and it is the regular read()
function that gets the most calls, though not yet as much as with GNU
grep.  However, this already gave MUCH better runtimes with Doug's trial
script:

  GNU grep
  Elapsed time: 57 seconds

  BSD grep (original)
  Elapsed time: 820 seconds  (~14.4x slower than GNU grep)

  BSD grep (quickfixed)
  Elapsed time: 115 seconds  (~2.0x slower than GNU grep)

It proves that getting rid of the fgetc's is certainly worthwhile, and I
have attached a more complete patch that:

- Replaces the horrendously inefficient grep_fgetln() with mostly the
  same implementation as the libc fgetln() function.
- Uses plain file descriptors instead of struct FILE, since the
  buffering is done manually anyway, and it makes it easier to support
  gzip and bzip2.
- Let the bzip2 reader just read the file as plain data, when the
  initial magic number doesn't match, mimicking the behaviour of GNU
  grep.

There is probably more room for optimization, but let's see if this
survives a bunch of tests first. :)