Re: grep extremely slow for LC_CTYPE=C? [SOLVED]

From: Stefan Esser <se_at_freebsd.org>
Date: Thu, 3 May 2018 19:54:56 +0200
Am 03.05.18 um 17:28 schrieb Kyle Evans:
> On Thu, May 3, 2018 at 10:19 AM, Stefan Esser <se_at_freebsd.org> wrote:
>> Am 03.05.18 um 16:41 schrieb Kyle Evans:
>>> Hmm... what does `grep -V` look like, just to confirm?
>>
>> Ah, yes, good point ...
>>
>> $ which grep
>> /usr/bin/grep
>>
>> $ grep -V
>> grep (GNU grep) 2.5.1-FreeBSD
>>
>> Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>> So, it seems I have to complain somewhere else about this behavior ...
> 
> Eh, no worries there. Newer GNU grep sucks less, and we're going to
> replace it Real Soon Now (TM).

Thank you very much - your reply was really helpful!

I just tested with GNU grep 2.27 (the current port version) and it does not
show the extreme slowness of the old version in FreeBSD, but is still more
than 10 times slower than BSD grep on my test data.

>> But I have (for a long time) in my /etc/src.conf:
>>
>> WITH_BSDGREP=            yes
>> WITH_BSD_GREP_FASTMATCH= yes
>> WITHOUT_GNU_GREP_COMPAT= yes
>>
>> And before seeing the grep -V output, I was convinced that I had been using
>> BSD grep (i.e. that it replaced GNU grep with above options) by default ...
>>
>> But now I see that I need to invoke bsdgrep under that name. It is very fast,
>> but does not give the expected (correct?) result, which is the single line
>> that is not suppressed by the pattern match ...
> 
> This is actually because you've typo'd WITH_BSD_GREP. =) WITH_BSD_GREP
> will replace /usr/bin/grep with bsdgrep and put GNU grep at
> /usr/bin/gnugrep.

Yes, that was what I had expected, and I had correctly spelled WITH_BSD_PATCH,
but never bother to check that I got the "grep" I wanted ...

> I also recommend using WITHOUT_BSD_GREP_FASTMATCH / not using
> WITH_BSD_GREP_FASTMATCH. See below response.

It is so much faster than GNU grep on this use-case anyway ;-)

$ sh grep-test.sh
All/mpfr-3.1.7.tgz
        0.14 real         0.13 user         0.00 sys
All/mpfr-3.1.7.tgz
        0.13 real         0.13 user         0.00 sys

This is a factor 30 to 40 better than with our GNU grep (for the UTF-8 case,
where it finishes in finite time, orders of magnitude faster for LANG=C ;-) ).

And yes, FASTMATCH was responsible for the erroneous result in my previous
tests with BSD grep. Now that I have rebuild it without that option, it works
perfectly for me :)

> BSD_GREP_FASTMATCH is best left off (default on HEAD)- it was disabled
> because the version of tre ("fastmatch") that bsdgrep uses is buggy
> and I don't want to invest the time to fix it. The performance of the
> version we use isn't any better than our libc regex(3), so I made the
> decision to switch it to that and focus efforts on optimizing our
> general regex implementation instead.

A decision I can well understand and sympathize with.

How about removing the BSD_GREP_FASTMATCH option, then?

> I have plans to replace our libc regex(3) with Onigmo [1], which is at
> least twice as fast as what we have and comes with all kinds of other
> extensions- GNU extensions will be exposed via libregex, and I also
> plan to install Onigmo on its own so that others can use that with its
> own interface. The difference between it and libregex will be that
> libregex exposes a regex(3) interface for using extensions with an
> option to go REG_POSIX.
> 
> [1] https://github.com/k-takata/Onigmo

Great plan! But for now BSD grep seems well up to the task and my only
problem is now, that I need to support stable releases that use (and will
stay with) the old GNU grep, so I'll need to keep the work-around (or
perhaps depend on the port version?).

Thanks again!

Best regards, STefan
Received on Thu May 03 2018 - 15:55:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC