Few days back, there was a knowledge session on regular expressions within my team. After discussing the usual topics like greedy & lazy quantifiers, backreferences, etc, we started analyzing match results for few expressions. I 've familiarity with regex and used them in majority of my throw-away scripts and I thought I knew regex unless been baffled with the simple questions from the team. I list down few of those simplest of the simple patterns and what they match and why ( which actually led to me learn the rules of the game),


Before even starting to look at them, did I mentioned earlier that regex engine would start its search just before the first character of the string ? If not, let me tell you now, it need to start before the first character, if and all the patterns contains anchors ( ^, \b, etc ), it needs to check them too. And the search would go beyond the last character in the string and now you know why ( to match $, \b, etc ).


(i) x*


pattern :x*


string :foxxx


Matched: foxxx


Explanation: As mentioned in the rule 2 here, the greediness would always try to match more, hence read the pattern 'x*' as 'match more occurrence of x or nothing'. And the engine going to do its search character by character in the string. Since it could not find any 'x' to match at the starting position, it tries with its other choice ' match nothing' and it succeeds.


(ii) .*


pattern :.*


string :foxxx


Matched: >


Explanation: '.' matches anything other than '\n'. Though the pattern '.*' can be read as 'match more of any characters other than '\n' or nothing', the rule of greediness gives the preference to match more characters.


(iii) x*


pattern :x*


string :xxxfoxxx


Matched: > foxxx


Explanation: Same greediness favors the match more criteria.












More...