Re: Official request: Please make GNU grep the default

From: Dimitry Andric <dimitry_at_andric.com> Date: Wed, 18 Aug 2010 23:54:41 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC

On 2010-08-18 23:12, Dimitry Andric wrote:
>> And one trial is not statistically valid - especially given the small
>> differences.  How about multiple multiple trials with ministat.
> 
> The result were averages of three trials

Actually, since I kept using Doug's original grep-time-trial.sh, each of
the three 'trials' consisted of running grep 100 times, and the listed
time was the total elapsed time for those 100 runs.  So I assume that
will reasonably average out the differences between each individual run?

Also, I'm not sure if the actual disk/fs reading performance will differ
much between GNU grep and any other grep, since they will all basically
read through the whole test file sequentially.  For instance, when I
profiled GNU grep with gprof, the top time results were:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 99.1       0.59     0.59    11497     0.05     0.05  read [5]
  0.6       0.59     0.00    11497     0.00     0.00  kwsexec [8]
  0.1       0.59     0.00        0  100.00%           .mcount (130)
  0.1       0.59     0.00        1     0.62   594.77  grepfile [3]
  0.1       0.60     0.00    11496     0.00     0.00  memmove [9]
  0.0       0.60     0.00        4     0.03     0.03  memchr [10]
  0.0       0.60     0.00    12541     0.00     0.00  memset [16]
  0.0       0.60     0.00    11497     0.00     0.00  EGexecute [7]
  0.0       0.60     0.00    11497     0.00     0.05  fillbuf [4]
  0.0       0.60     0.00    11497     0.00     0.00  grepbuf [6]

E.g. it looks like most of the time is spent in the read system call.
If mmap'ing can help improve that, it would be nice, but I suspect the
gains would be marginal.

The actual performance difference is much more likely to be related to
how efficiently grep parses out lines, and searches for regexps in
there.  BSD grep still has quite some room for improvement in that
department.