On Mon, Sep 2, 2013 at 7:45 PM, Andriy Gapon <avg_at_freebsd.org> wrote: > on 02/09/2013 17:54 Andriy Gapon said the following: >> >> re_format(7) says: >> There are two special cases‡ of bracket expressions: the bracket expres‐ >> sions ‘[[:<:]]’ and ‘[[:>:]]’ match the null string at the beginning and >> end of a word respectively. A word is defined as a sequence of word >> characters which is neither preceded nor followed by word characters. A >> word character is an alnum character (as defined by ctype(3)) or an >> underscore. This is an extension, compatible with but not specified by >> IEEE Std 1003.2 (“POSIX.2”), and should be used with caution in software >> intended to be portable to other systems. >> >> However I observe the following: >> $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g' >> xx >> $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g' >> cd1 xx >> >> In my opinion '[[:<:]]' should not affect how the pattern is matched in this case. > > It seems that the code works like this: > - first it matches "cd0 " and "removes" it > - then it passes "cd1 xx" for matching with a flag that tells that this is not > a real start of the string > - thus the matching code > o knows that this is not a real line start, so it can't match [[:<:]] > just for that reason > o it does _not_ know what was the character before the start of the given > substring, so it can not know if it could match [[:<:]] > > So matching fails. > Not sure if this is an internal problem of regex(3) or a problem of how sed(1) > uses regex(3). > > -- > Andriy Gapon In my opinion this is a bug. The [[:<:]] operator is said to match the empty string at the beginning of a word with no mention that the word has to be at the beginning of the whole string that is matched. OS X version of sed(1) works differently: $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g' xx $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g' xx -KimmoReceived on Mon Sep 02 2013 - 15:52:20 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:41 UTC