DEV Community

Hommies
Hommies

Posted on

LeetCode Solution: 10. Regular Expression Matching

Mastering Regular Expressions: A Deep Dive into LeetCode 10 with Dynamic Programming

Hey there, fellow coders! 👋 Are you ready to tackle one of LeetCode's classic challenges that might seem daunting at first glance but becomes quite elegant with the right approach? Today, we're diving into Problem 10: Regular Expression Matching.

This problem is a fantastic way to sharpen your dynamic programming (DP) skills and gain a deeper understanding of how regular expressions actually work under the hood. Don't worry if regex feels like magic – we're going to demystify it together!


The Puzzle: Regular Expression Matching

We're given two strings: an input string s and a pattern p. Our goal is to determine if the pattern p completely matches the input string s.

The twist? The pattern p isn't just plain text. It can contain two special characters:

  • '.': This dot matches any single character. Think of it as a wildcard for one character.
  • '*': This asterisk matches zero or more of the preceding element. This is where things get interesting!

Important Note: The match must cover the entire input string s, not just a part of it.

Let's look at a few examples to solidify our understanding:

Example 1: Basic Mismatch

  • s = "aa", p = "a"
  • Output: false
  • Why? The pattern "a" only matches the first 'a' of "aa". It doesn't cover the entire string.

Example 2: The Power of *

  • s = "aa", p = "a*"
  • Output: true
  • Why? * means "zero or more of the preceding element," which is 'a'. By repeating 'a' once, the pattern becomes "aa", matching the input. If s was "a", it would also match (zero repetitions). If s was "aaa", it would match (two repetitions).

Example 3: .* - The Ultimate Wildcard Combo

  • s = "ab", p = ".*"
  • Output: true
  • Why? . matches any single character. * means "zero or more" of whatever precedes it. So, .* means "zero or more of any character." It can match "ab", "a", "", "xyz", anything!

The "Aha!" Moment: Intuition

When you see problems involving matching parts of strings, especially with wildcards or options like "zero or more," your mind should start thinking about Dynamic Programming (DP). Why DP? Because the decision at any point (does s[i] match p[j]) often depends on the results of smaller subproblems.

Imagine trying to match s and p.

  • If p[j] is a regular character or . you just check if s[i] matches p[j] and then move on to matching the rest of s and p.
  • But what if p[j] is *? This is the tricky part. * gives us options:
    1. It could match zero occurrences of the character before it.
    2. It could match one or more occurrences of the character before it.

These multiple choices, and the fact that we can build up a solution from smaller matches, strongly hints at DP!


Our Strategy: Dynamic Programming Grid

We'll use a 2D boolean array, let's call it dp, where dp[i][j] will be True if the first i characters of s (i.e., s[0...i-1]) match the first j characters of p (i.e., p[0...j-1]). Otherwise, it will be False.

Let m be the length of s and n be the length of p. Our dp table will be (m+1) x (n+1).

Step 1: Initialize the DP Table

  • dp[0][0] = True: An empty string (s with 0 characters) always matches an empty pattern (p with 0 characters). This is our base case.

  • First Row (dp[0][j] for j > 0): This represents matching an empty string s with a non-empty pattern p.
    An empty string can only be matched by a pattern that allows zero characters. This typically happens with *.
    If p[j-1] is '*', then it can match zero of the preceding element (p[j-2]). In this scenario, dp[0][j] would be True if dp[0][j-2] was True. All other dp[0][j] would be False.

Step 2: Fill the DP Table

We'll iterate through s (from i = 1 to m) and p (from j = 1 to n) to fill the table. For each dp[i][j], we consider two main scenarios for the current character p[j-1]:

Scenario 1: p[j-1] is NOT '*' (i.e., it's a regular character or '.')

  • If p[j-1] matches s[i-1] (meaning p[j-1] == s[i-1] or p[j-1] == '.'), then s[0...i-1] matches p[0...j-1] only if s[0...i-2] matched p[0...j-2].
  • So, dp[i][j] = dp[i-1][j-1].
  • Otherwise (if p[j-1] doesn't match s[i-1]), dp[i][j] = False.

Scenario 2: p[j-1] IS '*'

This is the trickiest part. The * implies "zero or more" of the preceding character (p[j-2]). Let's call this preceding character prev_char_in_pattern = p[j-2].

We have two possibilities for how * can be interpreted:

  1. * matches zero occurrences of prev_char_in_pattern:

    • In this case, prev_char_in_pattern and the * effectively disappear from the pattern.
    • So, we need to check if s[0...i-1] matches p[0...j-3]. This state is represented by dp[i][j-2].
    • Thus, dp[i][j] = dp[i][j-2].
  2. * matches one or more occurrences of prev_char_in_pattern:

    • For * to match one or more times, prev_char_in_pattern must match the current character s[i-1]. (i.e., prev_char_in_pattern == s[i-1] or prev_char_in_pattern == '.').
    • If they match, s[i-1] is consumed by one instance of prev_char_in_pattern. We then need s[0...i-2] to match p[0...j-1] (the prev_char_in_pattern followed by * is still active, potentially matching more characters).
    • This state is represented by dp[i-1][j].
    • So, if prev_char_in_pattern matches s[i-1], then dp[i][j] can also be dp[i-1][j].
  • Combining these: dp[i][j] will be True if either of these conditions holds. dp[i][j] = dp[i][j-2] (zero occurrences) If prev_char_in_pattern matches s[i-1] (i.e., p[j-2] == s[i-1] or p[j-2] == '.'): dp[i][j] = dp[i][j] or dp[i-1][j] (one or more occurrences)

Step 3: The Result

After filling the entire dp table, the final answer will be in dp[m][n].

Let's walk through an example: s = "aab", p = "c*a*b"

dp[i][j] "" c * a * b
"" T F T F T F
a F F F T T F
aa F F F F T F
aab F F F F F T
  • dp[0][0] is T.
  • dp[0][2] (for c*): p[1] is *, so dp[0][2] = dp[0][0] which is T. (empty string matches c* by c repeating 0 times).
  • dp[0][4] (for c*a*): p[3] is *, so dp[0][4] = dp[0][2] which is T. (empty string matches c*a* by a repeating 0 times).
  • Now, consider dp[1][1] (s="a", p="c"): p[0] ('c') != s[0] ('a'). So F.
  • Consider dp[1][4] (s="a", p="c*a*"):
    • p[3] is *. Preceding char p[2] is 'a'.
    • dp[1][2] (zero 'a's for a*) is F.
    • Does p[2] ('a') match s[0] ('a')? Yes!
    • dp[0][4] (one or more 'a's) is T.
    • So dp[1][4] becomes F OR T = T.
  • Finally, dp[3][6] will be T.

The Code

class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        m, n = len(s), len(p)

        # dp[i][j] will be True if s[0...i-1] matches p[0...j-1]
        # Dimensions: (m+1) x (n+1)
        dp = [[False] * (n + 1) for _ in range(m + 1)]

        # Base case: Empty string matches empty pattern
        dp[0][0] = True

        # Initialize the first row (s is empty)
        # An empty string 's' can only match patterns like "a*", "a*b*", etc.
        # This means the pattern must end with '*' and its preceding element can be skipped.
        for j in range(1, n + 1):
            if p[j - 1] == '*':
                # If p[j-1] is '*', it means zero occurrences of p[j-2].
                # So, dp[0][j] depends on whether p[0...j-3] matched an empty string.
                dp[0][j] = dp[0][j - 2]

        # Fill the DP table for all other cells
        for i in range(1, m + 1):
            for j in range(1, n + 1):
                # Scenario 1: Current pattern character is a regular character or '.'
                # (p[j-1] is not '*')
                if p[j - 1] == s[i - 1] or p[j - 1] == '.':
                    # If current characters match, depend on the match of preceding substrings
                    dp[i][j] = dp[i - 1][j - 1]

                # Scenario 2: Current pattern character is '*'
                elif p[j - 1] == '*':
                    # Case A: '*' matches zero occurrences of the preceding element (p[j-2])
                    # We effectively ignore p[j-2] and p[j-1] (the '*')
                    # So, s[0...i-1] must match p[0...j-3]
                    dp[i][j] = dp[i][j - 2]

                    # Case B: '*' matches one or more occurrences of the preceding element (p[j-2])
                    # This is only possible if the preceding character (p[j-2]) matches s[i-1]
                    # (p[j-2] == s[i-1] or p[j-2] == '.')
                    if p[j - 2] == s[i - 1] or p[j - 2] == '.':
                        # If they match, '*' consumes s[i-1], and we check if s[0...i-2] matches p[0...j-1]
                        # (where p[j-1] is still '*', allowing it to match further characters)
                        dp[i][j] = dp[i][j] or dp[i - 1][j]

        return dp[m][n]

Enter fullscreen mode Exit fullscreen mode

Time and Space Complexity Analysis

  • Time Complexity: O(m * n)

    • We are filling an m x n DP table. Each cell calculation takes constant time (a few comparisons and assignments). Therefore, the total time complexity is proportional to the product of the lengths of the string s and pattern p.
  • Space Complexity: O(m * n)

    • We use a 2D dp array of size (m+1) x (n+1) to store our intermediate results. This contributes O(m * n) to the space complexity.

Given the constraints 1 <= s.length <= 20 and 1 <= p.length <= 20, an O(m*n) solution is perfectly efficient for this problem. 20 * 20 = 400 operations per test case is lightning fast!


Key Takeaways

  1. Dynamic Programming for String Matching: Problems involving matching substrings, especially with wildcards or varying length matches, are prime candidates for DP.
  2. Defining DP State: Clearly defining dp[i][j] as the match for prefixes s[0...i-1] and p[0...j-1] is crucial.
  3. Handling Special Characters (. and *):
    • . is straightforward: it matches any single character.
    • * requires careful consideration: it has two main interpretations ("zero occurrences" OR "one or more occurrences") that lead to combining results from different DP states.
      • Zero occurrences: The * and its preceding character are skipped (dp[i][j-2]).
      • One or more occurrences: The * matches the current character s[i-1] (if the preceding char allows it), and the * pattern remains active (dp[i-1][j]).
  4. Base Cases: Correctly initializing dp[0][0] and the first row (dp[0][j]) for an empty string s is vital for the DP to propagate correctly.

This problem is a fantastic way to stretch your DP muscles and see how complex string operations can be broken down into manageable subproblems. Keep practicing, and you'll master these patterns in no time!


Published by p1Hzd8mRM8 on 2026-05-21 16:52:13

Top comments (0)