Re: Official request: Please make GNU grep the default

From: Gabor Kovesdan <gabor_at_FreeBSD.org> Date: Fri, 13 Aug 2010 15:22:43 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC

Em 2010.08.13. 13:09, Matthias Andree escreveu:
> Gabor Kovesdan wrote on 2010-08-13:
>
>> Em 2010.08.13. 10:43, Doug Barton escreveu:
>>> My reason is simple, performance. While doing some portmaster work
>>> recently I was regression testing some changes I made to the --index*
>>> options and noticed that things were dramatically slower than the last
>>> time I tested those features. Thinking that I had made a programming
>>> mistake I dug into my code, and while the regexps that I was using 
>>> could
>>> be tuned for slightly better performance the problem was not in my 
>>> code.
>>> I then installed textproc/gnugrep to compare, and the differences were
>>> very dramatic using a highly pessimized test case (finding a match on
>>> the last line of INDEX). The script I used to test is at
>>> http://people.freebsd.org/~dougb/grep-time-trial.sh.txt and a typical
>>> result was:
>>>
>>> GNU grep
>>> Elapsed time: 2 seconds
>>>
>>> BSD grep
>>> Elapsed time: 47 seconds
>>>
>> Ok, I'll take care of this soon, and make GNU grep default, again 
>> with a knob to build BSD grep. I agree with you that we cannot allow 
>> such a big performance drawback but I my measures only showed 
>> significant differences for very big searches and I didn't imagine 
>> that it could add up to such a big diference. I'm sorry for the bad 
>> decision I took making it default.
>
> Without knowing any of the details (I am not using 9-CURRENT), Gabor, 
> I suggest that you check the documentation around Google's RE2 library 
> (which is in C++); there are quite a few bits of information relating 
> to (including worst-case) performance of regexp matchers, both 
> directly in the re2 documentation, as well as indirect through links 
> and references.  Might be worth a read, together with profiling Doug's 
> test case if he could tell you how to reproduce those.
>
Thanks, Matthias. I haven't looked deeply at this but iirc it uses 
Perl-syntax. We need an efficient, wchar-aware, POSIX(ish) regex library 
with a good license and atm only TRE conforms to these criteria. 
Besides, we need GNU-style regex support, which will have to be added to 
TRE before we can replace our libc-regex.

Gabor