Thursday, September 5, 2013

Regular Expression Matching

Implement regular expression matching with support for '.' and '*'.
'.' Matches any single character. 
'*' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
The function prototype should be:
 bool isMatch(const char *s, const char *p)
Some examples:
isMatch("aa","a")  false
isMatch("aa","aa")  true
isMatch("aaa","aa")  false
isMatch("aa", "a*")  true
isMatch("aa", ".*")  true
isMatch("ab", ".*")  true
isMatch("aab", "c*a*b")  true

A good explanation from the official leetcode site can be found here

It is somewhat easy to think of an algorithm for most of the cases except for patterns which have a .* .
For eg :
s = "abcbcd" ; p ="a.*c.*cd"
Here, ".*" in p means repeat '.' 0 or more times. 
Since '.' can match any character, it is not clear how many times '.' should be repeated. Should the 'c' in p matches the first or second 'c' in s? Unfortunately, there is no way to tell without using some kind of exhaustive search.
Here is the algorithm presented by leetcode. It's short but do devote ample time going through it, lest you overlook some test cases. 

Code:
    bool isMatch(const char *s, const char *p) {  
            assert(s && p);
    if (*p == '\0') return *s == '\0';

    // next char is not '*': must match current character
    if (*(p+1) != '*') {
        assert(*p != '*');
        return ((*p == *s) || (*p == '.' && *s != '\0')) && isMatch(s+1, p+1);
    }
    // next char is '*'
    while ((*p == *s) || (*p == '.' && *s != '\0')) {
        if (isMatch(s, p+2)) return true;
        s++;
    }
    return isMatch(s, p+2);
    }


No comments:

Post a Comment