On 23/08/2010 13:45, Bob Bishop wrote: > On 23 Aug 2010, at 12:51, chrisk-freebsd_at_list.mightyreason.com wrote: >> [...]The hardest part of POSIX regular expressions is in picking out the >> correct captured subexpressions. This makes programs like "sed" >> vulnerable to the bad regex.h engine that comes with the operating system. >> >> Luckily grep does not need the captured subexpressions, and thus does >> not need the complexity that comes from the ideas behind TRE. [etc] > I think grep does potentially need the captured subexpressions, for eg: > > \([abc]\)99\1 > > matching eg b99b That would be "basic POSIX" instead of "extended POSIX" regular expressions. Making basic regular expressions correct is something I have never attempted. Correctness is what I would like to emphasize. GNU grep defines the regex.h header for POSIX regular expressions but does not try to deliver the correct POSIX answer. Getting the wrong answer quickly is not a virtue as debugging prematurely optimized code is too hard. http://www2.research.att.com/~gsf/testregex/re-interpretation.html has the best explanation of basic and extended POSIX regular expressions. And last I checked the AT&T code was nearly correct but exponentially slow. I hope that if "grep" in *BSD and thus OSX gets worked on that it gets more correct, not less. I hope that someday the regex.h engine in *BSD and thus OSX gets fixed, but I am not holding my breath for that. All I have written is a correct (and efficient) "extended POSIX" matching engine in Haskell and OCaml. Cheers, Chris KuklewiczReceived on Mon Aug 23 2010 - 11:42:59 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC