Jaspreet singh

Posted on Jul 1

KMP Algorithm (Knuth-Morris-Pratt): The Smart Way to Perform String Matching in O(N)

#algorithms #computerscience #performance #tutorial

Searching for a pattern inside a string is one of the oldest and most common problems in Computer Science.

Whether you're building:

Google Search
Code Editors
Plagiarism Detection Software
DNA Sequence Matching
Malware Signature Detection

you need an efficient way to search for strings.

A brute force solution compares characters repeatedly, leading to O(N × M) complexity.

But what if we could avoid comparing the same characters again and again?

That's exactly what the Knuth-Morris-Pratt (KMP) Algorithm does.

Instead of moving the pattern one character at a time and restarting comparisons, KMP intelligently remembers previous matches using an array called the LPS (Longest Prefix Suffix) array.

The result?

Linear Time String Matching — O(N + M)

Why Do We Need KMP?

Suppose we have

Text

ABABDABACDABABCABAB

Pattern

ABABCABAB

A brute force algorithm compares characters one by one.

Whenever a mismatch occurs,

it starts comparing from the beginning of the pattern again.

This causes lots of repeated work.

Example

ABABC

ABABD
    ^
Mismatch

Again start from beginning.

Many of the previous comparisons are repeated.

KMP eliminates this redundancy.

The Core Idea

Instead of asking

"Where should I restart?"

KMP asks

"What do I already know?"

It uses information from previous matches to skip unnecessary comparisons.

The key data structure is the LPS Array.

What is the LPS Array?

LPS stands for

Longest Proper Prefix which is also a Proper Suffix

Sounds complicated?

Let's simplify.

Prefix

A prefix starts from the beginning of the string.

Example

ABCAB

Prefixes

A

AB

ABC

ABCA

ABCAB

Suffix

A suffix ends at the last character.

Example

ABCAB

Suffixes

B

AB

CAB

BCAB

ABCAB

Proper Prefix

A proper prefix is every prefix except the complete string.

ABCAB

Proper Prefixes

A

AB

ABC

ABCA

Proper Suffix

Similarly,

Proper suffix excludes the complete string.

B

AB

CAB

BCAB

Longest Prefix = Suffix

Consider

ABABCABAB

Prefixes

A

AB

ABA

ABAB

Suffixes

B

AB

BAB

ABAB

Longest common value

ABAB

Length

LPS = 4

Building the LPS Array

Example

Pattern

AAACAAAA

Index	Character	LPS
0	A	0
1	A	1
2	A	2
3	C	0
4	A	1
5	A	2
6	A	3
7	A	3

The LPS array becomes

0 1 2 0 1 2 3 3

How KMP Works

KMP has two phases.

Phase 1: Build the LPS Array

We preprocess the pattern once.

Time

O(M)

Phase 2: Pattern Matching

Now compare

Text

↓

Pattern

When characters match

Move both pointers.

When mismatch occurs

Instead of moving pattern back to index 0,

jump directly using

j = lps[j-1]

This is the magic of KMP.

Dry Run

Text

ABABDABACDABABCABAB

Pattern

ABABCABAB

Initially

i = 0

j = 0

Characters match

A == A

Move both

Eventually

Mismatch occurs

Instead of

j = 0

KMP does

j = lps[j-1]

This skips unnecessary comparisons.

Eventually

Pattern found at

Index = 10

Why is KMP Faster?

Imagine matching

AAAAAAAAAB

inside

AAAAAAAAAAAAAAAAAB

Brute Force

Repeatedly compares

AAAAAAAA

again and again.

KMP

Already knows

AAAAAAA

matched previously.

So it skips directly.

No repeated comparisons.

Java Implementation

Building LPS Array

public static int[] buildLPS(String pattern) {

    int n = pattern.length();
    int[] lps = new int[n];

    int len = 0;
    int i = 1;

    while (i < n) {

        if (pattern.charAt(i) == pattern.charAt(len)) {

            len++;
            lps[i] = len;
            i++;

        } else {

            if (len != 0) {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }

    return lps;
}

KMP Search

public static void search(String text, String pattern) {

    int[] lps = buildLPS(pattern);

    int i = 0;
    int j = 0;

    while (i < text.length()) {

        if (text.charAt(i) == pattern.charAt(j)) {
            i++;
            j++;
        }

        if (j == pattern.length()) {

            System.out.println("Pattern found at index " + (i - j));
            j = lps[j - 1];

        } else if (i < text.length()
                && text.charAt(i) != pattern.charAt(j)) {

            if (j != 0)
                j = lps[j - 1];
            else
                i++;
        }
    }
}

Complexity Analysis

Operation	Complexity
Build LPS	O(M)
Pattern Search	O(N)
Total	O(N + M)
Space	O(M)

Where

N = Length of Text
M = Length of Pattern

Real-World Applications

🔍 Search Engines

Fast keyword searching.

Example

Google Search.

Plagiarism Detection

Finding repeated text across documents.

DNA Sequence Matching

Matching genome sequences efficiently.

IDE & Code Editors

Searching variables, methods and classes.

Malware Detection

Matching virus signatures inside executable files.

Advantages

Linear Time Complexity

No repeated comparisons

Efficient for long texts

Excellent interview problem

Limitations

Slightly harder to understand than brute force

Extra preprocessing needed

Only supports exact pattern matching

Brute Force vs KMP

Feature	Brute Force	KMP
Time Complexity	O(N × M)	O(N + M)
Repeated Comparisons	Yes	No
Extra Space	O(1)	O(M)
Uses Preprocessing	No	Yes
Suitable for Large Inputs	❌	✅

KMP vs Z Algorithm

KMP	Z Algorithm
Uses LPS Array	Uses Z Array
Pattern preprocessing	Prefix matching
More common in interviews	Simpler implementation
O(N + M)	O(N + M)

Common Interview Questions

Why do we need the LPS array?

The LPS array stores information about previous matches, allowing KMP to skip unnecessary comparisons after a mismatch.

Why is KMP O(N + M)?

Each character in both the text and pattern is processed at most once. The LPS array prevents rechecking characters that have already been matched.

Can KMP search multiple patterns?

No. KMP is designed for a single pattern. For multiple patterns, algorithms like Aho-Corasick are preferred.

Why doesn't KMP move the text pointer backward?

Because the LPS array provides the next best matching position in the pattern, allowing the algorithm to continue without revisiting text characters.

Interview One-Liner

KMP performs string matching in linear time by preprocessing the pattern into an LPS array, allowing it to skip unnecessary comparisons after mismatches instead of restarting from the beginning.

TL;DR

KMP is an efficient string matching algorithm with O(N + M) time complexity.
It uses an LPS (Longest Prefix Suffix) array to remember previous matches.
Instead of restarting after a mismatch, KMP jumps to the next possible matching position using the LPS array.
The algorithm consists of two phases: LPS construction and pattern matching.
KMP is widely used in search engines, plagiarism detection, DNA analysis, IDEs, and malware detection.
It avoids repeated comparisons, making it significantly faster than the brute-force approach for large inputs.

💬 Final Thoughts

The KMP Algorithm is one of the most elegant examples of how preprocessing can dramatically improve performance. While the LPS array may seem confusing at first, once you understand the intuition behind "reusing previous matches", the algorithm becomes much easier to visualize and implement.

If you're preparing for coding interviews, KMP is a must-know algorithm because it not only tests your understanding of string manipulation but also demonstrates your ability to optimize brute-force solutions into linear-time algorithms.

Have you ever implemented KMP in an interview or real-world project? Share your experience in the comments!

Top comments (1)

algorhymer • Jul 3

What I'm about to write is not a joke. Not cynicism. Not a snark.

I think the problem with CS101 is not that there aren't enough sources available.
I - still - believe that, in this day and age of passive content consumption, the presentation matters.
Why?
People need stories. People rarely remember algos, formulas.
Do you remember Neo?

The raw information works, but has a negative side-effect:
it hides the article author's personality and imagination.
Maybe... maybe instead of raw, dry... you could lace it with a story, a setting perhaps.
To illustrate this:
Would you be interested in reduction strategies?
Or would you rather be interested in the adventures of Señor Puss in Boots, master of fencing, thief of all dairy products?

Maybe consider a 80-20 split?
80% Monad tutorial
20% Se7en's What's in the box? scene