On 2010-08-15 21:07, Dag-Erling Smørgrav wrote: > I built a profiling version of BSD grep and ran it with a regexp that > matches only the very last line in (my copy of) INDEX-9. The results > are pretty clear: [I also did precisely that, and I just read your mail when I was composing the following... :) ] I had a look at Doug Barton's time trial script for GNU and BSD grep, and decided to profile both greps a bit. This gave some interesting results. I started with GNU grep, doing approximately the same search as in Doug's trial, e.g. grep -m1 "|/usr/ports/x11-wm/xfce4-wm|", but I made the testfile 373491 lines instead of the original 21971, with the only matching line as the last one. The GNU grep top called functions are (please ignore the .mcount entry, which is a gprof side effect): % cumulative self self total time seconds seconds calls ms/call ms/call name 99.1 0.59 0.59 11497 0.05 0.05 read [5] 0.6 0.59 0.00 11497 0.00 0.00 kwsexec [8] 0.1 0.59 0.00 0 100.00% .mcount (130) 0.1 0.59 0.00 1 0.62 594.77 grepfile [3] 0.1 0.60 0.00 11496 0.00 0.00 memmove [9] 0.0 0.60 0.00 4 0.03 0.03 memchr [10] 0.0 0.60 0.00 12541 0.00 0.00 memset [16] 0.0 0.60 0.00 11497 0.00 0.00 EGexecute [7] 0.0 0.60 0.00 11497 0.00 0.05 fillbuf [4] 0.0 0.60 0.00 11497 0.00 0.00 grepbuf [6] The same exercise with BSD grep indeed runs *much* slower, and its top called functions are: % cumulative self self total time seconds seconds calls ms/call ms/call name 67.3 8.60 8.60 0 100.00% .mcount (105) 22.2 11.44 2.84 22778 0.12 0.12 __sys_read [9] 3.6 11.91 0.46 373553797 0.00 0.00 grep_feof [11] 3.5 12.35 0.45 373180306 0.00 0.00 fgetc [6] 2.1 12.63 0.27 373180306 0.00 0.00 grep_fgetc [5] 1.2 12.78 0.15 373491 0.00 0.01 grep_fgetln [4] 0.0 12.78 0.01 464 0.01 0.01 memset [12] 0.0 12.79 0.00 1 2.83 4184.69 procfile [3] 0.0 12.79 0.00 5 0.42 0.69 free [14] 0.0 12.79 0.00 22778 0.00 0.00 __sread [607] The testfile has 373180306 bytes, so you can see there is a tremendous overhead in calling grep_fgetc, grep_feof and fgetc, which is done separately for EACH byte in the file! So my first quick fix attempt was to replace the home-grown grep_fgetln with fgetln(3), which is in libc. This does not support gzip and bzip2 files, but just to prove the point, it is enough. It gave the following profiling result: % cumulative self self total time seconds seconds calls ms/call ms/call name 80.0 0.68 0.68 5695 0.12 0.12 read [5] 15.5 0.81 0.13 379178 0.00 0.00 memchr [8] 3.2 0.83 0.03 0 100.00% .mcount (102) 0.6 0.84 0.00 451 0.01 0.01 memset [9] 0.4 0.84 0.00 5 0.59 0.85 free [10] 0.1 0.84 0.00 1 1.23 814.47 procfile [3] 0.1 0.84 0.00 373491 0.00 0.00 fgetln [4] 0.0 0.84 0.00 373491 0.00 0.00 grep_fgetln [29] 0.0 0.84 0.00 11379 0.00 0.00 memcpy [30] 0.0 0.84 0.00 50 0.00 0.00 arena_run_alloc [36] You can see there is much less overhead, and it is the regular read() function that gets the most calls, though not yet as much as with GNU grep. However, this already gave MUCH better runtimes with Doug's trial script: GNU grep Elapsed time: 57 seconds BSD grep (original) Elapsed time: 820 seconds (~14.4x slower than GNU grep) BSD grep (quickfixed) Elapsed time: 115 seconds (~2.0x slower than GNU grep) It proves that getting rid of the fgetc's is certainly worthwhile, and I have attached a more complete patch that: - Replaces the horrendously inefficient grep_fgetln() with mostly the same implementation as the libc fgetln() function. - Uses plain file descriptors instead of struct FILE, since the buffering is done manually anyway, and it makes it easier to support gzip and bzip2. - Let the bzip2 reader just read the file as plain data, when the initial magic number doesn't match, mimicking the behaviour of GNU grep. There is probably more room for optimization, but let's see if this survives a bunch of tests first. :)
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC