on 02/09/2013 17:54 Andriy Gapon said the following: > > re_format(7) says: > There are two special cases‡ of bracket expressions: the bracket expres‐ > sions ‘[[:<:]]’ and ‘[[:>:]]’ match the null string at the beginning and > end of a word respectively. A word is defined as a sequence of word > characters which is neither preceded nor followed by word characters. A > word character is an alnum character (as defined by ctype(3)) or an > underscore. This is an extension, compatible with but not specified by > IEEE Std 1003.2 (“POSIX.2”), and should be used with caution in software > intended to be portable to other systems. > > However I observe the following: > $ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g' > xx > $ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g' > cd1 xx > > In my opinion '[[:<:]]' should not affect how the pattern is matched in this case. It seems that the code works like this: - first it matches "cd0 " and "removes" it - then it passes "cd1 xx" for matching with a flag that tells that this is not a real start of the string - thus the matching code o knows that this is not a real line start, so it can't match [[:<:]] just for that reason o it does _not_ know what was the character before the start of the given substring, so it can not know if it could match [[:<:]] So matching fails. Not sure if this is an internal problem of regex(3) or a problem of how sed(1) uses regex(3). -- Andriy GaponReceived on Mon Sep 02 2013 - 14:46:27 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:41 UTC