Mastering Regular Expressions: A Deep Dive into LeetCode 10 with Dynamic Programming
Hey there, fellow coders! 👋 Are you ready to tackle one of LeetCode's classic challenges that might seem daunting at first glance but becomes quite elegant with the right approach? Today, we're diving into Problem 10: Regular Expression Matching.
This problem is a fantastic way to sharpen your dynamic programming (DP) skills and gain a deeper understanding of how regular expressions actually work under the hood. Don't worry if regex feels like magic – we're going to demystify it together!
The Puzzle: Regular Expression Matching
We're given two strings: an input string s and a pattern p. Our goal is to determine if the pattern p completely matches the input string s.
The twist? The pattern p isn't just plain text. It can contain two special characters:
-
'.': This dot matches any single character. Think of it as a wildcard for one character. -
'*': This asterisk matches zero or more of the preceding element. This is where things get interesting!
Important Note: The match must cover the entire input string s, not just a part of it.
Let's look at a few examples to solidify our understanding:
Example 1: Basic Mismatch
-
s = "aa",p = "a" - Output:
false - Why? The pattern "a" only matches the first 'a' of "aa". It doesn't cover the entire string.
Example 2: The Power of *
-
s = "aa",p = "a*" - Output:
true - Why?
*means "zero or more of the preceding element," which is 'a'. By repeating 'a' once, the pattern becomes "aa", matching the input. Ifswas "a", it would also match (zero repetitions). Ifswas "aaa", it would match (two repetitions).
Example 3: .* - The Ultimate Wildcard Combo
-
s = "ab",p = ".*" - Output:
true - Why?
.matches any single character.*means "zero or more" of whatever precedes it. So,.*means "zero or more of any character." It can match "ab", "a", "", "xyz", anything!
The "Aha!" Moment: Intuition
When you see problems involving matching parts of strings, especially with wildcards or options like "zero or more," your mind should start thinking about Dynamic Programming (DP). Why DP? Because the decision at any point (does s[i] match p[j]) often depends on the results of smaller subproblems.
Imagine trying to match s and p.
- If
p[j]is a regular character or.you just check ifs[i]matchesp[j]and then move on to matching the rest ofsandp. - But what if
p[j]is*? This is the tricky part.*gives us options:- It could match zero occurrences of the character before it.
- It could match one or more occurrences of the character before it.
These multiple choices, and the fact that we can build up a solution from smaller matches, strongly hints at DP!
Our Strategy: Dynamic Programming Grid
We'll use a 2D boolean array, let's call it dp, where dp[i][j] will be True if the first i characters of s (i.e., s[0...i-1]) match the first j characters of p (i.e., p[0...j-1]). Otherwise, it will be False.
Let m be the length of s and n be the length of p. Our dp table will be (m+1) x (n+1).
Step 1: Initialize the DP Table
dp[0][0] = True: An empty string (swith 0 characters) always matches an empty pattern (pwith 0 characters). This is our base case.First Row (
dp[0][j]forj > 0): This represents matching an empty stringswith a non-empty patternp.
An empty string can only be matched by a pattern that allows zero characters. This typically happens with*.
Ifp[j-1]is'*', then it can match zero of the preceding element (p[j-2]). In this scenario,dp[0][j]would beTrueifdp[0][j-2]wasTrue. All otherdp[0][j]would beFalse.
Step 2: Fill the DP Table
We'll iterate through s (from i = 1 to m) and p (from j = 1 to n) to fill the table. For each dp[i][j], we consider two main scenarios for the current character p[j-1]:
Scenario 1: p[j-1] is NOT '*' (i.e., it's a regular character or '.')
- If
p[j-1]matchess[i-1](meaningp[j-1] == s[i-1]orp[j-1] == '.'), thens[0...i-1]matchesp[0...j-1]only ifs[0...i-2]matchedp[0...j-2]. - So,
dp[i][j] = dp[i-1][j-1]. - Otherwise (if
p[j-1]doesn't matchs[i-1]),dp[i][j] = False.
Scenario 2: p[j-1] IS '*'
This is the trickiest part. The * implies "zero or more" of the preceding character (p[j-2]). Let's call this preceding character prev_char_in_pattern = p[j-2].
We have two possibilities for how * can be interpreted:
-
*matches zero occurrences ofprev_char_in_pattern:- In this case,
prev_char_in_patternand the*effectively disappear from the pattern. - So, we need to check if
s[0...i-1]matchesp[0...j-3]. This state is represented bydp[i][j-2]. - Thus,
dp[i][j] = dp[i][j-2].
- In this case,
-
*matches one or more occurrences ofprev_char_in_pattern:- For
*to match one or more times,prev_char_in_patternmust match the current characters[i-1]. (i.e.,prev_char_in_pattern == s[i-1]orprev_char_in_pattern == '.'). - If they match,
s[i-1]is consumed by one instance ofprev_char_in_pattern. We then needs[0...i-2]to matchp[0...j-1](theprev_char_in_patternfollowed by*is still active, potentially matching more characters). - This state is represented by
dp[i-1][j]. - So, if
prev_char_in_patternmatchess[i-1], thendp[i][j]can also bedp[i-1][j].
- For
- Combining these:
dp[i][j]will beTrueif either of these conditions holds.dp[i][j] = dp[i][j-2](zero occurrences) Ifprev_char_in_patternmatchess[i-1](i.e.,p[j-2] == s[i-1]orp[j-2] == '.'):dp[i][j] = dp[i][j] or dp[i-1][j](one or more occurrences)
Step 3: The Result
After filling the entire dp table, the final answer will be in dp[m][n].
Let's walk through an example: s = "aab", p = "c*a*b"
| dp[i][j] | "" | c | * | a | * | b |
|---|---|---|---|---|---|---|
| "" | T | F | T | F | T | F |
| a | F | F | F | T | T | F |
| aa | F | F | F | F | T | F |
| aab | F | F | F | F | F | T |
-
dp[0][0]isT. -
dp[0][2](forc*):p[1]is*, sodp[0][2] = dp[0][0]which isT. (empty string matchesc*bycrepeating 0 times). -
dp[0][4](forc*a*):p[3]is*, sodp[0][4] = dp[0][2]which isT. (empty string matchesc*a*byarepeating 0 times). - Now, consider
dp[1][1](s="a",p="c"):p[0]('c') !=s[0]('a'). SoF. - Consider
dp[1][4](s="a",p="c*a*"):-
p[3]is*. Preceding charp[2]is 'a'. -
dp[1][2](zero 'a's fora*) isF. - Does
p[2]('a') matchs[0]('a')? Yes! -
dp[0][4](one or more 'a's) isT. - So
dp[1][4]becomesF OR T = T.
-
- Finally,
dp[3][6]will beT.
The Code
class Solution(object):
def isMatch(self, s, p):
"""
:type s: str
:type p: str
:rtype: bool
"""
m, n = len(s), len(p)
# dp[i][j] will be True if s[0...i-1] matches p[0...j-1]
# Dimensions: (m+1) x (n+1)
dp = [[False] * (n + 1) for _ in range(m + 1)]
# Base case: Empty string matches empty pattern
dp[0][0] = True
# Initialize the first row (s is empty)
# An empty string 's' can only match patterns like "a*", "a*b*", etc.
# This means the pattern must end with '*' and its preceding element can be skipped.
for j in range(1, n + 1):
if p[j - 1] == '*':
# If p[j-1] is '*', it means zero occurrences of p[j-2].
# So, dp[0][j] depends on whether p[0...j-3] matched an empty string.
dp[0][j] = dp[0][j - 2]
# Fill the DP table for all other cells
for i in range(1, m + 1):
for j in range(1, n + 1):
# Scenario 1: Current pattern character is a regular character or '.'
# (p[j-1] is not '*')
if p[j - 1] == s[i - 1] or p[j - 1] == '.':
# If current characters match, depend on the match of preceding substrings
dp[i][j] = dp[i - 1][j - 1]
# Scenario 2: Current pattern character is '*'
elif p[j - 1] == '*':
# Case A: '*' matches zero occurrences of the preceding element (p[j-2])
# We effectively ignore p[j-2] and p[j-1] (the '*')
# So, s[0...i-1] must match p[0...j-3]
dp[i][j] = dp[i][j - 2]
# Case B: '*' matches one or more occurrences of the preceding element (p[j-2])
# This is only possible if the preceding character (p[j-2]) matches s[i-1]
# (p[j-2] == s[i-1] or p[j-2] == '.')
if p[j - 2] == s[i - 1] or p[j - 2] == '.':
# If they match, '*' consumes s[i-1], and we check if s[0...i-2] matches p[0...j-1]
# (where p[j-1] is still '*', allowing it to match further characters)
dp[i][j] = dp[i][j] or dp[i - 1][j]
return dp[m][n]
Time and Space Complexity Analysis
-
Time Complexity:
O(m * n)- We are filling an
m x nDP table. Each cell calculation takes constant time (a few comparisons and assignments). Therefore, the total time complexity is proportional to the product of the lengths of the stringsand patternp.
- We are filling an
-
Space Complexity:
O(m * n)- We use a 2D
dparray of size(m+1) x (n+1)to store our intermediate results. This contributesO(m * n)to the space complexity.
- We use a 2D
Given the constraints 1 <= s.length <= 20 and 1 <= p.length <= 20, an O(m*n) solution is perfectly efficient for this problem. 20 * 20 = 400 operations per test case is lightning fast!
Key Takeaways
- Dynamic Programming for String Matching: Problems involving matching substrings, especially with wildcards or varying length matches, are prime candidates for DP.
- Defining DP State: Clearly defining
dp[i][j]as the match for prefixess[0...i-1]andp[0...j-1]is crucial. - Handling Special Characters (
.and*):-
.is straightforward: it matches any single character. -
*requires careful consideration: it has two main interpretations ("zero occurrences" OR "one or more occurrences") that lead to combining results from different DP states.- Zero occurrences: The
*and its preceding character are skipped (dp[i][j-2]). - One or more occurrences: The
*matches the current characters[i-1](if the preceding char allows it), and the*pattern remains active (dp[i-1][j]).
- Zero occurrences: The
-
- Base Cases: Correctly initializing
dp[0][0]and the first row (dp[0][j]) for an empty stringsis vital for the DP to propagate correctly.
This problem is a fantastic way to stretch your DP muscles and see how complex string operations can be broken down into manageable subproblems. Keep practicing, and you'll master these patterns in no time!
Published by p1Hzd8mRM8 on 2026-05-21 16:52:13
Top comments (0)