Re: grep extremely slow for LC_CTYPE=C? [SOLVED]

From: Kyle Evans <kevans_at_freebsd.org>
Date: Thu, 3 May 2018 13:11:05 -0500
On Thu, May 3, 2018 at 12:54 PM, Stefan Esser <se_at_freebsd.org> wrote:
> Am 03.05.18 um 17:28 schrieb Kyle Evans:
>> On Thu, May 3, 2018 at 10:19 AM, Stefan Esser <se_at_freebsd.org> wrote:
>>> Am 03.05.18 um 16:41 schrieb Kyle Evans:
>>>> Hmm... what does `grep -V` look like, just to confirm?
>>>
>>> Ah, yes, good point ...
>>>
>>> $ which grep
>>> /usr/bin/grep
>>>
>>> $ grep -V
>>> grep (GNU grep) 2.5.1-FreeBSD
>>>
>>> Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
>>> This is free software; see the source for copying conditions. There is NO
>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>>
>>> So, it seems I have to complain somewhere else about this behavior ...
>>
>> Eh, no worries there. Newer GNU grep sucks less, and we're going to
>> replace it Real Soon Now (TM).
>
> Thank you very much - your reply was really helpful!
>
> I just tested with GNU grep 2.27 (the current port version) and it does not
> show the extreme slowness of the old version in FreeBSD, but is still more
> than 10 times slower than BSD grep on my test data.
>

This is good. =) We tend to be slower in most areas, so any win is a good one.

>>> But I have (for a long time) in my /etc/src.conf:
>>>
>>> WITH_BSDGREP=            yes
>>> WITH_BSD_GREP_FASTMATCH= yes
>>> WITHOUT_GNU_GREP_COMPAT= yes
>>>
>>> And before seeing the grep -V output, I was convinced that I had been using
>>> BSD grep (i.e. that it replaced GNU grep with above options) by default ...
>>>
>>> But now I see that I need to invoke bsdgrep under that name. It is very fast,
>>> but does not give the expected (correct?) result, which is the single line
>>> that is not suppressed by the pattern match ...
>>
>> This is actually because you've typo'd WITH_BSD_GREP. =) WITH_BSD_GREP
>> will replace /usr/bin/grep with bsdgrep and put GNU grep at
>> /usr/bin/gnugrep.
>
> Yes, that was what I had expected, and I had correctly spelled WITH_BSD_PATCH,
> but never bother to check that I got the "grep" I wanted ...
>
>> I also recommend using WITHOUT_BSD_GREP_FASTMATCH / not using
>> WITH_BSD_GREP_FASTMATCH. See below response.
>
> It is so much faster than GNU grep on this use-case anyway ;-)
>
> $ sh grep-test.sh
> All/mpfr-3.1.7.tgz
>         0.14 real         0.13 user         0.00 sys
> All/mpfr-3.1.7.tgz
>         0.13 real         0.13 user         0.00 sys
>
> This is a factor 30 to 40 better than with our GNU grep (for the UTF-8 case,
> where it finishes in finite time, orders of magnitude faster for LANG=C ;-) ).
>
> And yes, FASTMATCH was responsible for the erroneous result in my previous
> tests with BSD grep. Now that I have rebuild it without that option, it works
> perfectly for me :)

Also good to hear!

>> BSD_GREP_FASTMATCH is best left off (default on HEAD)- it was disabled
>> because the version of tre ("fastmatch") that bsdgrep uses is buggy
>> and I don't want to invest the time to fix it. The performance of the
>> version we use isn't any better than our libc regex(3), so I made the
>> decision to switch it to that and focus efforts on optimizing our
>> general regex implementation instead.
>
> A decision I can well understand and sympathize with.
>
> How about removing the BSD_GREP_FASTMATCH option, then?

Right- I've been meaning to find time to rip it all out. I'll see if I
can harvest some spare time from the weekend to make it happen.

>> I have plans to replace our libc regex(3) with Onigmo [1], which is at
>> least twice as fast as what we have and comes with all kinds of other
>> extensions- GNU extensions will be exposed via libregex, and I also
>> plan to install Onigmo on its own so that others can use that with its
>> own interface. The difference between it and libregex will be that
>> libregex exposes a regex(3) interface for using extensions with an
>> option to go REG_POSIX.
>>
>> [1] https://github.com/k-takata/Onigmo
>
> Great plan! But for now BSD grep seems well up to the task and my only
> problem is now, that I need to support stable releases that use (and will
> stay with) the old GNU grep, so I'll need to keep the work-around (or
> perhaps depend on the port version?).

I do recommend pulling in textproc/gnugrep if you can. GNU grep in
base has bugs that are likely going to stay unless someone (that isn't
me =)) wants to take up the task of maintaining an older version of
GNU Grep that's going to be disappearing from head. Newer versions
have a lot more sensible behavior than what we have in base.

> Thanks again!
>
> Best regards, STefan
Received on Thu May 03 2018 - 16:11:28 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC