Re: why GNU grep is fast

From: Sean C. Farley <scf_at_FreeBSD.org> Date: Sun, 22 Aug 2010 11:30:01 -0500 (CDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC

On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:

> Mike Haertel <mike_at_ducky.net> writes:
>> GNU grep uses the well-known Boyer-Moore algorithm, which looks first 
>> for the final letter of the target string, and uses a lookup table to 
>> tell it how far ahead it can skip in the input whenever it finds a 
>> non-matching character.
>
> Boyer-Moore is for fixed search strings.  I don't see how that 
> optimization can work with a regexp search unless the regexp is so 
> simple that you break it down into a small number of cases with known 
> length and final character.

When I was working on making FreeGrep faster (years ago), I wrote down a 
few notes about possible algorithms, especially those that could be 
useful for fgrep functionality.  I am just passing these onto the list.

Some algorithms:
1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
3. GNU fgrep:  Commentz-Walter
4. GLIMPSE:  http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant)

Also, this may be a useful book:
http://www.dcc.uchile.cl/~gnavarro/FPMbook/

Sean
-- 
scf_at_FreeBSD.org