The following are the rules, a non-POSIX regular expression engine(such as in PERL, JAVA, etc ) would adhere to while attempting to match with the string,


Notation: the examples would list the given regex(pattern) , the string tested against (string) and the actual match happened in the string in between ''.






1. The match that begins earliest/leftmost wins.


The intention is to match the cat at the end but the 'cat' in the catalogue won the match as it appears leftmost in the string.


pattern :cat


string :This catalogue has the names of different species of cat.


Matched: This > alogue has the names of different species of cat.





1a.The leftmost match in the string wins, irrespective of the order a pattern appears in alternation


Though last in the alternation, 'catalogue' got the match as it appeared leftmost among the patterns in the alternation.


pattern :species|names|catalogue


string :This catalogue has the names of different species of cat.


Matched: This > has the names of different species of cat.






1b. If there are more than one plausible match occurs in the same position, then the order of the plausible matching patterns in the alternation counts.


All three patterns have a possible match at the same position, but 'over' is successful as it appeared first in the alternation.


pattern :over|o|overnight


string :Actually, I'm an overnight success. But it took twenty years.


Matched: Actually, I'm an > night success. But it took twenty years.






2. The standard quantifiers (* +, ? and {m,n}) are greedy


Greediness (*,+,?) would always try to match more before it tries to match minimum characters needed for the match to be successful ( '0' for *,? ; '1' for + )


The intention is to match the "Joy is prayer", though .* went pass across all the double quotes and grabbing all the strings only to match the last double quote (").


pattern :".*"


string :"Joy is prayer"."Joy is strength"."Joy is Love".


Matched: > .






2a. Lazy quantifiers would favor the minimum match


Laziness (*?,+?,??) would always try to settle with minimum characters needed for the match to be successful before it tries to match the maximum.


The first double quote (') appeared was matched using lazy quantifier.


pattern :".*?"

[FONT='courier new', courier, monospace]
string :"Joy is prayer"."Joy is strength"."Joy is Love".


Matched: > ."Joy is strength"."Joy is Love".

[/FONT]
2b. The only time the greedy quantifiers would give up what they've matched earlier and settle for less is 'when matching too much ends up causing some later part of the regex to fail'.


The \w* would match the whole word 'regular_expressions' initially. Later, since 's' didn't have a character to match and tend to fail would trigger the \w* to backtrack and match one character less. Thus the final 's' matches the 's' just released by \w* and whole match succeeds.


Note: Though the pattern would work the same way without paranthesis, I'd used them to show the individual matches in $1, $2, etc.


pattern :(\w*)(s)


string :regular_expressions


Matched: >


$1 = regular_expression


$2 = s






Similarly, the initial match 'x' by 'x*' was given by later for the favor of the last 'x' in the pattern.


pattern :(x*)(x)


string :ox


Matched: o>


$1 =


$2 = x






2c. When more than one greedy quantifiers appears in a pattern, the first greedy would get the preference.


Though the .* initially matched the whole string, the [0-9]+ would able to grab just one digit '5' from the .*, and the 0-9]+ settles with it since that satisfies its minimum match criteriat. Note that the '+' is also a greedy quantifier and here it cant grab beyond its minimum requirement, since already there is an another greedy quantifier shares the same match.


Enter pattern :(.*)([0-9]+)


Enter string :Bangalore-560025


Matched: >


$1 = Bangalore-56002


$2 = 5






3. Overall match takes precedence.


Ability to report a successful match takes precedence. As its shown in previous example, if its necessary for a successful match the quantifiers ( greedy or lazy ) would work in harmony with the rest of the pattern.










More...