DEV Community

Jaspreet singh
Jaspreet singh

Posted on

KMP Algorithm (Knuth-Morris-Pratt): The Smart Way to Perform String Matching in O(N)

Searching for a pattern inside a string is one of the oldest and most common problems in Computer Science.

Whether you're building:

  • Google Search
  • Code Editors
  • Plagiarism Detection Software
  • DNA Sequence Matching
  • Malware Signature Detection

you need an efficient way to search for strings.

A brute force solution compares characters repeatedly, leading to O(N × M) complexity.

But what if we could avoid comparing the same characters again and again?

That's exactly what the Knuth-Morris-Pratt (KMP) Algorithm does.

Instead of moving the pattern one character at a time and restarting comparisons, KMP intelligently remembers previous matches using an array called the LPS (Longest Prefix Suffix) array.

The result?

Linear Time String Matching — O(N + M)


Why Do We Need KMP?

Suppose we have

Text

ABABDABACDABABCABAB

Pattern

ABABCABAB
Enter fullscreen mode Exit fullscreen mode

A brute force algorithm compares characters one by one.

Whenever a mismatch occurs,

it starts comparing from the beginning of the pattern again.

This causes lots of repeated work.

Example

ABABC

ABABD
    ^
Mismatch

Again start from beginning.
Enter fullscreen mode Exit fullscreen mode

Many of the previous comparisons are repeated.

KMP eliminates this redundancy.


The Core Idea

Instead of asking

"Where should I restart?"

KMP asks

"What do I already know?"

It uses information from previous matches to skip unnecessary comparisons.

The key data structure is the LPS Array.


What is the LPS Array?

LPS stands for

Longest Proper Prefix which is also a Proper Suffix

Sounds complicated?

Let's simplify.


Prefix

A prefix starts from the beginning of the string.

Example

ABCAB
Enter fullscreen mode Exit fullscreen mode

Prefixes

A

AB

ABC

ABCA

ABCAB
Enter fullscreen mode Exit fullscreen mode

Suffix

A suffix ends at the last character.

Example

ABCAB
Enter fullscreen mode Exit fullscreen mode

Suffixes

B

AB

CAB

BCAB

ABCAB
Enter fullscreen mode Exit fullscreen mode

Proper Prefix

A proper prefix is every prefix except the complete string.

ABCAB

Proper Prefixes

A

AB

ABC

ABCA
Enter fullscreen mode Exit fullscreen mode

Proper Suffix

Similarly,

Proper suffix excludes the complete string.

B

AB

CAB

BCAB
Enter fullscreen mode Exit fullscreen mode

Longest Prefix = Suffix

Consider

ABABCABAB
Enter fullscreen mode Exit fullscreen mode

Prefixes

A

AB

ABA

ABAB
Enter fullscreen mode Exit fullscreen mode

Suffixes

B

AB

BAB

ABAB
Enter fullscreen mode Exit fullscreen mode

Longest common value

ABAB
Enter fullscreen mode Exit fullscreen mode

Length

4
Enter fullscreen mode Exit fullscreen mode

So

LPS = 4
Enter fullscreen mode Exit fullscreen mode

Building the LPS Array

Example

Pattern

AAACAAAA
Enter fullscreen mode Exit fullscreen mode
Index Character LPS
0 A 0
1 A 1
2 A 2
3 C 0
4 A 1
5 A 2
6 A 3
7 A 3

The LPS array becomes

0 1 2 0 1 2 3 3
Enter fullscreen mode Exit fullscreen mode


How KMP Works

KMP has two phases.


Phase 1: Build the LPS Array

We preprocess the pattern once.

Time

O(M)
Enter fullscreen mode Exit fullscreen mode

Phase 2: Pattern Matching

Now compare

Text

↓

Pattern
Enter fullscreen mode Exit fullscreen mode

When characters match

Move both pointers.
Enter fullscreen mode Exit fullscreen mode

When mismatch occurs

Instead of moving pattern back to index 0,

jump directly using

j = lps[j-1]
Enter fullscreen mode Exit fullscreen mode

This is the magic of KMP.


Dry Run

Text

ABABDABACDABABCABAB
Enter fullscreen mode Exit fullscreen mode

Pattern

ABABCABAB
Enter fullscreen mode Exit fullscreen mode

Initially

i = 0

j = 0
Enter fullscreen mode Exit fullscreen mode

Characters match

A == A

Move both
Enter fullscreen mode Exit fullscreen mode

Eventually

Mismatch occurs

Instead of

j = 0
Enter fullscreen mode Exit fullscreen mode

KMP does

j = lps[j-1]
Enter fullscreen mode Exit fullscreen mode

This skips unnecessary comparisons.

Eventually

Pattern found at

Index = 10
Enter fullscreen mode Exit fullscreen mode


Why is KMP Faster?

Imagine matching

AAAAAAAAAB
Enter fullscreen mode Exit fullscreen mode

inside

AAAAAAAAAAAAAAAAAB
Enter fullscreen mode Exit fullscreen mode

Brute Force

Repeatedly compares

AAAAAAAA
Enter fullscreen mode Exit fullscreen mode

again and again.

KMP

Already knows

AAAAAAA
Enter fullscreen mode Exit fullscreen mode

matched previously.

So it skips directly.

No repeated comparisons.


Java Implementation

Building LPS Array

public static int[] buildLPS(String pattern) {

    int n = pattern.length();
    int[] lps = new int[n];

    int len = 0;
    int i = 1;

    while (i < n) {

        if (pattern.charAt(i) == pattern.charAt(len)) {

            len++;
            lps[i] = len;
            i++;

        } else {

            if (len != 0) {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i++;
            }
        }
    }

    return lps;
}
Enter fullscreen mode Exit fullscreen mode

KMP Search

public static void search(String text, String pattern) {

    int[] lps = buildLPS(pattern);

    int i = 0;
    int j = 0;

    while (i < text.length()) {

        if (text.charAt(i) == pattern.charAt(j)) {
            i++;
            j++;
        }

        if (j == pattern.length()) {

            System.out.println("Pattern found at index " + (i - j));
            j = lps[j - 1];

        } else if (i < text.length()
                && text.charAt(i) != pattern.charAt(j)) {

            if (j != 0)
                j = lps[j - 1];
            else
                i++;
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Complexity Analysis

Operation Complexity
Build LPS O(M)
Pattern Search O(N)
Total O(N + M)
Space O(M)

Where

  • N = Length of Text
  • M = Length of Pattern

Real-World Applications

🔍 Search Engines

Fast keyword searching.

Example

Google Search.


Plagiarism Detection

Finding repeated text across documents.


DNA Sequence Matching

Matching genome sequences efficiently.


IDE & Code Editors

Searching variables, methods and classes.


Malware Detection

Matching virus signatures inside executable files.


Advantages

Linear Time Complexity

No repeated comparisons

Efficient for long texts

Excellent interview problem


Limitations

Slightly harder to understand than brute force

Extra preprocessing needed

Only supports exact pattern matching


Brute Force vs KMP

Feature Brute Force KMP
Time Complexity O(N × M) O(N + M)
Repeated Comparisons Yes No
Extra Space O(1) O(M)
Uses Preprocessing No Yes
Suitable for Large Inputs

KMP vs Z Algorithm

KMP Z Algorithm
Uses LPS Array Uses Z Array
Pattern preprocessing Prefix matching
More common in interviews Simpler implementation
O(N + M) O(N + M)

Common Interview Questions

Why do we need the LPS array?

The LPS array stores information about previous matches, allowing KMP to skip unnecessary comparisons after a mismatch.


Why is KMP O(N + M)?

Each character in both the text and pattern is processed at most once. The LPS array prevents rechecking characters that have already been matched.


Can KMP search multiple patterns?

No. KMP is designed for a single pattern. For multiple patterns, algorithms like Aho-Corasick are preferred.


Why doesn't KMP move the text pointer backward?

Because the LPS array provides the next best matching position in the pattern, allowing the algorithm to continue without revisiting text characters.


Interview One-Liner

KMP performs string matching in linear time by preprocessing the pattern into an LPS array, allowing it to skip unnecessary comparisons after mismatches instead of restarting from the beginning.


TL;DR

  • KMP is an efficient string matching algorithm with O(N + M) time complexity.
  • It uses an LPS (Longest Prefix Suffix) array to remember previous matches.
  • Instead of restarting after a mismatch, KMP jumps to the next possible matching position using the LPS array.
  • The algorithm consists of two phases: LPS construction and pattern matching.
  • KMP is widely used in search engines, plagiarism detection, DNA analysis, IDEs, and malware detection.
  • It avoids repeated comparisons, making it significantly faster than the brute-force approach for large inputs.

💬 Final Thoughts

The KMP Algorithm is one of the most elegant examples of how preprocessing can dramatically improve performance. While the LPS array may seem confusing at first, once you understand the intuition behind "reusing previous matches", the algorithm becomes much easier to visualize and implement.

If you're preparing for coding interviews, KMP is a must-know algorithm because it not only tests your understanding of string manipulation but also demonstrates your ability to optimize brute-force solutions into linear-time algorithms.

Have you ever implemented KMP in an interview or real-world project? Share your experience in the comments!

Top comments (0)