On 02/09/2013 16:09, Damian Weber wrote: > > On Mon, 2 Sep 2013, Andriy Gapon wrote: > >> re_format(7) says: >> There are two special cases? of bracket expressions: the bracket expres? >> sions ?[[:<:]]? and ?[[:>:]]? match the null string at the beginning and >> end of a word respectively. A word is defined as a sequence of word >> characters which is neither preceded nor followed by word characters. A >> word character is an alnum character (as defined by ctype(3)) or an >> underscore. This is an extension, compatible with but not specified by >> IEEE Std 1003.2 (?POSIX.2?), and should be used with caution in software >> intended to be portable to other systems. >> >> However I observe the following: >> $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g' >> xx >> $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g' >> cd1 xx >> >> In my opinion '[[:<:]]' should not affect how the pattern is matched in this case. >> >> Any thoughts, suggestions? > there are two simpler expressions, whose difference I don't understand either > (tested on 8.4-PRERELEASE) > > $ echo "cd0 cd1 xx" | sed 's/cd[0-9] //g' > xx > $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9] //g' > cd1 xx Well, I agree with your analysis, and I think it's certainly a bug. Do you think that the BUGS line in regex(3) should perhaps be extended to "never works properly"?: """ Word-boundary matching does not work properly in multibyte locales. """ [[:<:]] can be replaced by \b in a pcre, which works perfectly fine (of course) echo "this word word should be deleted" | perl -pe 's,\bword ,,g' this should be deleted Chris -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.Received on Tue Oct 01 2013 - 18:02:21 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:42 UTC