Re: grep extremely slow for LC_CTYPE=C?

From: Stefan Esser <se_at_freebsd.org>
Date: Thu, 3 May 2018 17:19:34 +0200
Am 03.05.18 um 16:41 schrieb Kyle Evans:

Hi Kyle,

thank you for the fast reply. You were right to request grep -V output,
but see below ... ;-)

> On Thu, May 3, 2018 at 9:08 AM, Stefan Esser <se_at_freebsd.org> wrote:
>> The first "grep" needs 3.5 seconds to finish on my system, but the second
>> one (with LC_CTYPE=C or no locale set at all) runs for minutes (I did not
>> bother to check whether it finishes at all).
>>
>> Is this a bug in grep?
>>
>> Maybe there is something odd in the data file (loading the pattern is not
>> slower with LC_CTYPE=C, it takes 0.8 seconds on my system), but this is a
>> problem that was observed with "real" data, not a specifically constructed
>> worst case.
>>
>> Any ideas what's causing this behavior?
>>
>> I'm currently setting the UTF-8 locale as in the first invocation above
>> to make grep run in reasonable time, but I'd expect it to be faster in
>> the C locale ...
>>
>> Regards, STefan
> 
> Hmm... what does `grep -V` look like, just to confirm?

Ah, yes, good point ...

$ which grep
/usr/bin/grep

$ grep -V
grep (GNU grep) 2.5.1-FreeBSD

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

So, it seems I have to complain somewhere else about this behavior ...

But I have (for a long time) in my /etc/src.conf:

WITH_BSDGREP=            yes
WITH_BSD_GREP_FASTMATCH= yes
WITHOUT_GNU_GREP_COMPAT= yes

And before seeing the grep -V output, I was convinced that I had been using
BSD grep (i.e. that it replaced GNU grep with above options) by default ...

But now I see that I need to invoke bsdgrep under that name. It is very fast,
but does not give the expected (correct?) result, which is the single line
that is not suppressed by the pattern match ...

> These are the results on my local system:
> 
> root_at_viper:/tmp/grep# ./grep-test.sh
> All/mpfr-3.1.7.tgz
>         0.10 real         0.10 user         0.00 sys
> All/mpfr-3.1.7.tgz
>         0.09 real         0.08 user         0.00 sys
> 
> But I don't immediately recall if I have local modifications in
> regex(3)/bsdgrep that might have affected this. =(

Yes, that's the correct result and extremely fast!

But on my system (with only "bsdgrep" substituted for "grep") I get

$ sh bsdgrep-test.sh | wc
        0.15 real         0.14 user         0.00 sys
        0.15 real         0.15 user         0.00 sys
    3362    3362   94700

I.e. only about 1/3 of the lines are suppressed by the pattern, while all
but 1 line should be ...

Or is one of the build options that I used unsafe?

Best regards, STefan
Received on Thu May 03 2018 - 13:19:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC