regular expression behaviour - Unix
This is a discussion on regular expression behaviour - Unix ; Hello all,
By executing the following statement at the command prompt (linux os)
$ echo " 01" | sed -n -e '/^ *...$/p'
the string " 01" (excluding the double quotes) is printed
to the
standard output. Can somebody explain ...
-
regular expression behaviour
Hello all,
By executing the following statement at the command prompt (linux os)
$ echo " 01" | sed -n -e '/^ *...$/p'
the string " 01" (excluding the double quotes) is printed
to the
standard output. Can somebody explain the behaviour? Of
course,
theoretically the regular expression matches the string, but according
to the
documentation it shouldn't. The documentation says that the *
operator is
greedy so the regular expression should match all the space at the
beginning
of the string and then (because of the 3 dots) it should try to match
3 more
characters. In order to justify my doubts I make reference to:
1) The section 3.1.1 of the sed FAQ (http://sed.sourceforge.net/
sedfaq.html)
which say:
A technical description of BREs from IEEE POSIX 1003.1-2001 and the
Single
UNIX Specification Version 3 is available online at:
http://www.opengroup.org/onlinepubs/...html#tag_09_03
2) The section 9.1 of the Open Group document IEEE POSIX
1003.1-2001 which
says:
Consistent with the whole match being the longest of the leftmost
matches,
each subpattern, from left to right, shall match the longest
possible
string.
So I understand that in my regular expression the subpattern " *"
will match
the longest possible match which are all the spaces at the beginning
of the
string and the "..." will be taken into account which will not match
anything.
regards
cristian zoicas
-
Re: regular expression behaviour
On Wednesday 17 September 2008 12:02, rst wrote:
> Hello all,
>
> By executing the following statement at the command prompt (linux os)
>
> $ echo " 01" | sed -n -e '/^ *...$/p'
>
> the string " 01" (excluding the double quotes) is printed
> to the standard output. Can somebody explain the behaviour? Of
> course, theoretically the regular expression matches the string, but
> according to the documentation it shouldn't. The documentation says that
> the * operator is greedy so the regular expression should match all the
> space at the beginning of the string and then (because of the 3 dots) it
> should try to match 3 more characters.
I think you should read something about greediness and backtracking.
Here's a good start:
http://www.regular-expressions.info
-
Re: regular expression behaviour
On Sep 17, 3:02*am, rst wrote:
> Hello all,
>
> By executing the following statement at the command prompt (linux os)
>
> $ echo " * * * * 01" | sed -n -e '/^ *...$/p'
>
> the *string " * * * * 01" *(excluding the *double quotes) *is printed
> to the
> standard * output. * Can * somebody * explain *the * behaviour? *Of
> course,
> theoretically the regular expression matches the string, but according
> to the
> documentation it *shouldn't. The *documentation says that *the *
> operator is
> greedy so the regular expression should *match all the space at the
> beginning
> of the string and then (because of the *3 dots) it should try to match
> 3 more
> characters. In order to justify my doubts I make reference to:
Your whole understanding of matching is slightly off. There are no
partial matches. The greedy operator is matching all it can match, you
are asking for it to match more than it can match -- it can't do that.
If the expression is ".x", it matches any character followed by an
"x". If you test "fa" against it, the "f" does *NOT* match the dot,
because the dot is followed by an "x" and the "f" is not.
Your mistake is in thinking that there are partial matches. There are
not.
A greedy operator will match all it can match, given the constraints
placed on it. The ".*" in ".*x" will match as many characters followed
by an 'x' as it can.
DS