Either I do something wrong or there is a regexp bug in sed !!

From: Zoltan Frombach <tssajo_at_hotmail.com>
Date: Sun, 14 Nov 2004 15:39:00 -0800
I'm trying to use sed under FreeBSD 5.3-RELEASE in a new 'netqmail' port I 
am currently working on. I want to replace a bunch of digits (in plain 
English: a decimal number) in a text file at the beginning of a line. Here 
is how the original file looks before I do anything (this file is part of 
the netqmail-1.05 package, but it is unimportant):

--- file conf-split begins
23

This is the queue subdirectory split.
--- file conf-split ends

Okay, so I try to replace 23 (or whatever number is there!) at the beginning 
of the first line to let's say 199 in this file using sed. I would expect 
this to work:

sed -e "s/^[0-9]+/199/" conf-split > conf-split.new

But it doesn't change anything in conf-spilt.new!! My regexp ^[0-9]+ doesn't 
match anything! After spending like an hour investigating this, I realized 
that the + after my bracket expression ( I'm talking about this part here: 
[0-9]+ ) does not match! If I omit the use of + and use * instead, I can 
make my regexp to match. So this works - but IMHO it's ugly:

sed -e "s/^[0-9][0-9]*/199/" conf-split > conf-split.new

It gives this output, which is what I always wanted:

--- file conf-split.new begins
199

This is the queue subdirectory split.
--- file conf-split.new ends

According to the sed man page, the regexp syntax that is used by sed is 
documented in the re_format man page. And according to the re_format man 
page: "A piece is an atom possibly followed by a single= `*', `+', `?', or 
bound.  An atom followed by `*' matches a sequence of 0 or more matches of 
the atom.  An atom followed by `+' matches a sequence of 1 or more matches 
of the atom. ..."
And the definition of an "atom" is (quoted from the same man page): "An atom 
is a regular expression enclosed in `()' (matching a match for the regular 
expression), an empty set of `()' (matching the null string)=, a bracket 
expression (see below) ..."

So either my bracket expression ( [0-9] ) in my first sed command was not 
recognized as an atom, or if it was recognized as an atom then the + that 
followed it was not interpreted properly... Can anyone please tell me why?

I believe this is a bug in sed or in the regexp library which sed uses. If 
it is a regexp library issue, then there is a chance that it affects other 
programs that use it, as well! At least it can break all programs that use 
sed regexps, especially ports...

My uname -a is:
FreeBSD www.xxxxxxxx.com 5.3-RELEASE FreeBSD 5.3-RELEASE #0: Fri Nov 12 
01:07:41 PST 2004     xxx_at_www.xxxxxxxx.com:/usr/obj/usr/src/sys/XXXXXXXX 
i386

Zoltan 
_______________________________________________
freebsd-stable_at_freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe_at_freebsd.org"
Received on Sun Nov 14 2004 - 22:41:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:22 UTC