regular expression behaviour - Unix

This is a discussion on regular expression behaviour - Unix ; Hello all, By executing the following statement at the command prompt (linux os) $ echo " 01" | sed -n -e '/^ *...$/p' the string " 01" (excluding the double quotes) is printed to the standard output. Can somebody explain ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: regular expression behaviour

  1. regular expression behaviour

    Hello all,

    By executing the following statement at the command prompt (linux os)

    $ echo " 01" | sed -n -e '/^ *...$/p'

    the string " 01" (excluding the double quotes) is printed
    to the
    standard output. Can somebody explain the behaviour? Of
    course,
    theoretically the regular expression matches the string, but according
    to the
    documentation it shouldn't. The documentation says that the *
    operator is
    greedy so the regular expression should match all the space at the
    beginning
    of the string and then (because of the 3 dots) it should try to match
    3 more
    characters. In order to justify my doubts I make reference to:

    1) The section 3.1.1 of the sed FAQ (http://sed.sourceforge.net/
    sedfaq.html)
    which say:

    A technical description of BREs from IEEE POSIX 1003.1-2001 and the
    Single
    UNIX Specification Version 3 is available online at:

    http://www.opengroup.org/onlinepubs/...html#tag_09_03

    2) The section 9.1 of the Open Group document IEEE POSIX
    1003.1-2001 which
    says:

    Consistent with the whole match being the longest of the leftmost
    matches,
    each subpattern, from left to right, shall match the longest
    possible
    string.

    So I understand that in my regular expression the subpattern " *"
    will match
    the longest possible match which are all the spaces at the beginning
    of the
    string and the "..." will be taken into account which will not match
    anything.

    regards
    cristian zoicas

  2. Re: regular expression behaviour

    On Wednesday 17 September 2008 12:02, rst wrote:

    > Hello all,
    >
    > By executing the following statement at the command prompt (linux os)
    >
    > $ echo " 01" | sed -n -e '/^ *...$/p'
    >
    > the string " 01" (excluding the double quotes) is printed
    > to the standard output. Can somebody explain the behaviour? Of
    > course, theoretically the regular expression matches the string, but
    > according to the documentation it shouldn't. The documentation says that
    > the * operator is greedy so the regular expression should match all the
    > space at the beginning of the string and then (because of the 3 dots) it
    > should try to match 3 more characters.


    I think you should read something about greediness and backtracking.
    Here's a good start:

    http://www.regular-expressions.info


  3. Re: regular expression behaviour

    On Sep 17, 3:02*am, rst wrote:
    > Hello all,
    >
    > By executing the following statement at the command prompt (linux os)
    >
    > $ echo " * * * * 01" | sed -n -e '/^ *...$/p'
    >
    > the *string " * * * * 01" *(excluding the *double quotes) *is printed
    > to the
    > standard * output. * Can * somebody * explain *the * behaviour? *Of
    > course,
    > theoretically the regular expression matches the string, but according
    > to the
    > documentation it *shouldn't. The *documentation says that *the *
    > operator is
    > greedy so the regular expression should *match all the space at the
    > beginning
    > of the string and then (because of the *3 dots) it should try to match
    > 3 more
    > characters. In order to justify my doubts I make reference to:


    Your whole understanding of matching is slightly off. There are no
    partial matches. The greedy operator is matching all it can match, you
    are asking for it to match more than it can match -- it can't do that.

    If the expression is ".x", it matches any character followed by an
    "x". If you test "fa" against it, the "f" does *NOT* match the dot,
    because the dot is followed by an "x" and the "f" is not.

    Your mistake is in thinking that there are partial matches. There are
    not.

    A greedy operator will match all it can match, given the constraints
    placed on it. The ".*" in ".*x" will match as many characters followed
    by an 'x' as it can.

    DS

+ Reply to Thread