Re: What to learn from the BSD grep case [Was: why GNU grep is fast]

From: C. P. Ghost <cpghost_at_cordula.ws> Date: Tue, 24 Aug 2010 03:16:09 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC

On Mon, Aug 23, 2010 at 5:04 PM, Gabor Kovesdan <gabor_at_freebsd.org> wrote:
> 4, We really need a good regex library. From the comments, it seems there's
> no such in the open source world. GNU libregex isn't efficient because GNU
> grep uses those workarounds that Mike kindly pointed out. Oniguruma was
> extremely slow when I checked it. PCRE supports Perl-style syntax with a
> POSIX-like API but not POSIX regex. Google RE2 is the same with additional
> egrep syntax but doesn't have support for standard POSIX regexes. Plan 9
> regex only supports egrep syntax. It seems that TRE is the best choice. It
> is BSD-licensed, supports wchar and POSIX(ish) regexes and it is quite fast.

I know it's C++ and not exactly what you're needing, but have you evaluated
Boost::Regex? Isn't there some code that can be retrofitted into a C lib?

http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/index.html

> I don't know the theoretical background of regex engines but I'm wondering
> if it's possible top provide an alternative API with byte-counted buffers
> and use the heuristical speedup with fixed string matching. As Mike pointed
> out the POSIX API is quite limiting because it works on NUL-terminated
> strings and not on byte-counted buffers, so we couldn't just do it with a
> POSIX-conformant library but it would be nice if we could implement it in
> such a library with an alternative interface.
>
> Gabor

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/