"Sean C. Farley" <scf_at_FreeBSD.org> writes: > Some algorithms: > 1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm Aho-Corasick is not really a search algorithm, but an algorithm for constructing a table-driven finite state machine that will match either of the search strings you fed it. I believe it is less efficient than Boyer-Moore for small numbers of search terms, since it scans the entire input. I don't see the point in using it in grep, because grep already has an algorithm for constructing finite state machines: regcomp(3). > 2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm It doesn't seem to compare favorably to the far older Aho-Corasick. It uses slightly less memory, but memory is usually not an issue with grep. > 4. GLIMPSE: http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore > variant) Glimpse is a POS... and not really comparable, because grep is designed to search for a single search string in multiple texts, while glimpse is designed to search a large amount of text over and over with different search strings. I believe it uses suffix tables to construct its index, and Boyer-Moore only to locate specific matches, since the index lists only files, not exact positions. For anything other than fixed strings, it reverts to agrep, but I assume (I haven't looked at the code) that if the regexp has one or more fixed components, it uses those to narrow the search space before running agrep. DES -- Dag-Erling Smørgrav - des_at_des.noReceived on Sun Aug 22 2010 - 17:53:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC