> >> Later on, he summarizes some of the existing implementations, >> including comments about the Plan 9 implementation and his own RE2, >> both of which efficiently handle international text (which seems to >> be a major concern of Gabor's). > > I believe Gabor is considering TRE for a good replacement regex library. Yes. Oniguruma is slow, Google RE2 only supports Perl and fgrep syntax but not standard regex and Plan 9 implementation iirc only supports fgrep syntax and Unicode but not wchar_t in general. > >> The key comment in Mike's GNU grep notes is the one about not >> breaking into lines. That's simply double-scanning the input; >> instead, run the matcher over blocks of text and, when it finds a >> match, work backwards from the match to find the appropriate line >> beginning. This is efficient because most lines don't match. > > I do like the idea. So do I. > > BTW, the fastgrep portion of bsdgrep is my fault/contribution to do a > faster search bypassing the regex library. :) It certainly was not > written with any encodings in mind; it was purely ASCII. As I have > not kept up with it, I do not know if anyone improved it or not. > It has been made wchar-compliant. GaborReceived on Mon Aug 23 2010 - 08:23:09 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC