"Sean C. Farley" <scf_at_FreeBSD.org> writes: > Dag-Erling Smørgrav <des_at_des.no> writes: > > Aho-Corasick is not really a search algorithm, but an algorithm for > > constructing a table-driven finite state machine that will match > > either of the search strings you fed it. I believe it is less > > efficient than Boyer-Moore for small numbers of search terms, since > > it scans the entire input. I don't see the point in using it in > > grep, because grep already has an algorithm for constructing finite > > state machines: regcomp(3). > especially those that could be useful for fgrep functionality Not entirely sure what you mean by that, but in most cases, I think Boyer-Moore is a better fit for fgrep. For large numbers of search terms, I might use Aho-Corasick... if it weren't for the fact that, as mentioned, grep already has a regexp engine, which is a superset of Aho-Corasick. It might be a tad slower at building the FSA, because it's more general, and the FSA might occupy a tad more memory, but the complexity is *exactly* the same, and adding Aho-Corasick just for fgrep is, for lack of a better word, pedantry. For all you know, the increased code size could very well offset whatever performance advantage Aho-Corasick might offer. > Glimpse may be a POS; I have not used it personally. I only noted its > algorithm for possible use within fgrep. Apples and oranges. The problem glimpse tries to solve (fixed corpus, variable search terms) is precisely the opposite of the one fgrep tries to solve (fixed search terms, variable corpus). Glimpse does include grep-like functionality, but in the form of agrep, which is designed for approximate matching. DES -- Dag-Erling Smørgrav - des_at_des.noReceived on Mon Aug 23 2010 - 05:56:33 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC