Searching for a pattern inside a string is one of the oldest and most common problems in Computer Science.
Whether you're building:
- Google Search
- Code Editors
- Plagiarism Detection Software
- DNA Sequence Matching
- Malware Signature Detection
you need an efficient way to search for strings.
A brute force solution compares characters repeatedly, leading to O(N × M) complexity.
But what if we could avoid comparing the same characters again and again?
That's exactly what the Knuth-Morris-Pratt (KMP) Algorithm does.
Instead of moving the pattern one character at a time and restarting comparisons, KMP intelligently remembers previous matches using an array called the LPS (Longest Prefix Suffix) array.
The result?
Linear Time String Matching — O(N + M)
Why Do We Need KMP?
Suppose we have
Text
ABABDABACDABABCABAB
Pattern
ABABCABAB
A brute force algorithm compares characters one by one.
Whenever a mismatch occurs,
it starts comparing from the beginning of the pattern again.
This causes lots of repeated work.
Example
ABABC
ABABD
^
Mismatch
Again start from beginning.
Many of the previous comparisons are repeated.
KMP eliminates this redundancy.
The Core Idea
Instead of asking
"Where should I restart?"
KMP asks
"What do I already know?"
It uses information from previous matches to skip unnecessary comparisons.
The key data structure is the LPS Array.
What is the LPS Array?
LPS stands for
Longest Proper Prefix which is also a Proper Suffix
Sounds complicated?
Let's simplify.
Prefix
A prefix starts from the beginning of the string.
Example
ABCAB
Prefixes
A
AB
ABC
ABCA
ABCAB
Suffix
A suffix ends at the last character.
Example
ABCAB
Suffixes
B
AB
CAB
BCAB
ABCAB
Proper Prefix
A proper prefix is every prefix except the complete string.
ABCAB
Proper Prefixes
A
AB
ABC
ABCA
Proper Suffix
Similarly,
Proper suffix excludes the complete string.
B
AB
CAB
BCAB
Longest Prefix = Suffix
Consider
ABABCABAB
Prefixes
A
AB
ABA
ABAB
Suffixes
B
AB
BAB
ABAB
Longest common value
ABAB
Length
4
So
LPS = 4
Building the LPS Array
Example
Pattern
AAACAAAA
| Index | Character | LPS |
|---|---|---|
| 0 | A | 0 |
| 1 | A | 1 |
| 2 | A | 2 |
| 3 | C | 0 |
| 4 | A | 1 |
| 5 | A | 2 |
| 6 | A | 3 |
| 7 | A | 3 |
The LPS array becomes
0 1 2 0 1 2 3 3
How KMP Works
KMP has two phases.
Phase 1: Build the LPS Array
We preprocess the pattern once.
Time
O(M)
Phase 2: Pattern Matching
Now compare
Text
↓
Pattern
When characters match
Move both pointers.
When mismatch occurs
Instead of moving pattern back to index 0,
jump directly using
j = lps[j-1]
This is the magic of KMP.
Dry Run
Text
ABABDABACDABABCABAB
Pattern
ABABCABAB
Initially
i = 0
j = 0
Characters match
A == A
Move both
Eventually
Mismatch occurs
Instead of
j = 0
KMP does
j = lps[j-1]
This skips unnecessary comparisons.
Eventually
Pattern found at
Index = 10
Why is KMP Faster?
Imagine matching
AAAAAAAAAB
inside
AAAAAAAAAAAAAAAAAB
Brute Force
Repeatedly compares
AAAAAAAA
again and again.
KMP
Already knows
AAAAAAA
matched previously.
So it skips directly.
No repeated comparisons.
Java Implementation
Building LPS Array
public static int[] buildLPS(String pattern) {
int n = pattern.length();
int[] lps = new int[n];
int len = 0;
int i = 1;
while (i < n) {
if (pattern.charAt(i) == pattern.charAt(len)) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
KMP Search
public static void search(String text, String pattern) {
int[] lps = buildLPS(pattern);
int i = 0;
int j = 0;
while (i < text.length()) {
if (text.charAt(i) == pattern.charAt(j)) {
i++;
j++;
}
if (j == pattern.length()) {
System.out.println("Pattern found at index " + (i - j));
j = lps[j - 1];
} else if (i < text.length()
&& text.charAt(i) != pattern.charAt(j)) {
if (j != 0)
j = lps[j - 1];
else
i++;
}
}
}
Complexity Analysis
| Operation | Complexity |
|---|---|
| Build LPS | O(M) |
| Pattern Search | O(N) |
| Total | O(N + M) |
| Space | O(M) |
Where
- N = Length of Text
- M = Length of Pattern
Real-World Applications
🔍 Search Engines
Fast keyword searching.
Example
Google Search.
Plagiarism Detection
Finding repeated text across documents.
DNA Sequence Matching
Matching genome sequences efficiently.
IDE & Code Editors
Searching variables, methods and classes.
Malware Detection
Matching virus signatures inside executable files.
Advantages
Linear Time Complexity
No repeated comparisons
Efficient for long texts
Excellent interview problem
Limitations
Slightly harder to understand than brute force
Extra preprocessing needed
Only supports exact pattern matching
Brute Force vs KMP
| Feature | Brute Force | KMP |
|---|---|---|
| Time Complexity | O(N × M) | O(N + M) |
| Repeated Comparisons | Yes | No |
| Extra Space | O(1) | O(M) |
| Uses Preprocessing | No | Yes |
| Suitable for Large Inputs | ❌ | ✅ |
KMP vs Z Algorithm
| KMP | Z Algorithm |
|---|---|
| Uses LPS Array | Uses Z Array |
| Pattern preprocessing | Prefix matching |
| More common in interviews | Simpler implementation |
| O(N + M) | O(N + M) |
Common Interview Questions
Why do we need the LPS array?
The LPS array stores information about previous matches, allowing KMP to skip unnecessary comparisons after a mismatch.
Why is KMP O(N + M)?
Each character in both the text and pattern is processed at most once. The LPS array prevents rechecking characters that have already been matched.
Can KMP search multiple patterns?
No. KMP is designed for a single pattern. For multiple patterns, algorithms like Aho-Corasick are preferred.
Why doesn't KMP move the text pointer backward?
Because the LPS array provides the next best matching position in the pattern, allowing the algorithm to continue without revisiting text characters.
Interview One-Liner
KMP performs string matching in linear time by preprocessing the pattern into an LPS array, allowing it to skip unnecessary comparisons after mismatches instead of restarting from the beginning.
TL;DR
- KMP is an efficient string matching algorithm with O(N + M) time complexity.
- It uses an LPS (Longest Prefix Suffix) array to remember previous matches.
- Instead of restarting after a mismatch, KMP jumps to the next possible matching position using the LPS array.
- The algorithm consists of two phases: LPS construction and pattern matching.
- KMP is widely used in search engines, plagiarism detection, DNA analysis, IDEs, and malware detection.
- It avoids repeated comparisons, making it significantly faster than the brute-force approach for large inputs.
💬 Final Thoughts
The KMP Algorithm is one of the most elegant examples of how preprocessing can dramatically improve performance. While the LPS array may seem confusing at first, once you understand the intuition behind "reusing previous matches", the algorithm becomes much easier to visualize and implement.
If you're preparing for coding interviews, KMP is a must-know algorithm because it not only tests your understanding of string manipulation but also demonstrates your ability to optimize brute-force solutions into linear-time algorithms.
Have you ever implemented KMP in an interview or real-world project? Share your experience in the comments!
Top comments (0)