Re: why GNU grep is fast

From: Dag-Erling Smørgrav <des_at_des.no> Date: Sun, 22 Aug 2010 13:54:35 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC

Mike Haertel <mike_at_ducky.net> writes:
> GNU grep uses the well-known Boyer-Moore algorithm, which looks
> first for the final letter of the target string, and uses a lookup
> table to tell it how far ahead it can skip in the input whenever
> it finds a non-matching character.

Boyer-Moore is for fixed search strings.  I don't see how that
optimization can work with a regexp search unless the regexp is so
simple that you break it down into a small number of cases with known
length and final character.

> GNU grep uses raw Unix input system calls and avoids copying data
> after reading it.

Yes, that was the first thing we looked at (and fixed) in BSD grep.

> Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES.  Looking
> for newlines would slow grep down by a factor of several times,
> because to find the newlines it would have to look at every byte!

Amen.  The current bottleneck in BSD grep is the memchr() that looks for
'\n' in the input buffer.

DES
-- 
Dag-Erling Smørgrav - des_at_des.no