DEV Community

Khushi
Khushi

Posted on

๐Ÿ” Rabin-Karp Algorithm

Ever tried finding a word in a huge chunk of text and thought,
There has to be a faster way than checking every single character!๐Ÿ˜ฉ
Enter the Rabin-Karp Algorithm โ€” a smart and efficient way to search for patterns in strings using a clever trick called rolling hash ๐Ÿช„
Letโ€™s break it down step by step โ€” no jargon, no confusion, just clarity and fun. ๐Ÿง โœจ
๐Ÿค” The Problem
Youโ€™re given:
Text โ†’ "geeksforgeeks"
Pattern โ†’ "geek"
And your task is to find all positions where the pattern appears in the text.
โœ… Expected Output:
[1, 9] // Using 1-based indexing (because humans count from 1 ๐Ÿ˜…)

๐Ÿง  The Big Idea (Why Rabin-Karp Is Smart?)
Instead of comparing every letter of the pattern with every possible substring of the text, what if we could:
1.Convert the pattern into a number (hash)
2.Convert parts of the text into numbers too
3.Compare those numbers instead of characters
โ†’ Much faster than checking each letter!
If the hashes match, we double-check the actual characters โ€” just to be sure (because different strings can sometimes have the same hash โ€” that's called a collision).

๐Ÿ”ข What's a Hash Anyway?
Think of a hash as a stringโ€™s fingerprint:
Same strings โ†’ same hash
Different strings โ†’ usually different hashes
Rabin-Karp uses a rolling hash, which is just a clever way of updating the hash as we move through the text, without recalculating everything from scratch.

๐Ÿ”„Rolling Hash โ€” The Star of the Show๐ŸŒŸ
Letโ€™s say weโ€™ve already calculated the hash for "geek" โ€” now we want the hash for the next window: "eeks".
Instead of this:
Recalculate hash("eeks") from scratch ๐Ÿ˜ฉ

We do this:
t_new = (d * (t_old - old_char * h) + new_char) % q
Where:
-t_old is the previous windowโ€™s hash
-old_char is the character sliding out of the window
-new_char is the character sliding into the window
-d is the number of characters in the alphabet (e.g., 256 for extended -ASCII)
-h = d^(M-1) % q is used to remove the effect of the oldest character
-q is a large prime number used to reduce hash collisions

We move the window and get a new hash in constant time! ๐Ÿ”ฅ

๐Ÿ’ฅWhy Rabin-Karp is Awesome
โœ… Fast search using hashing
โœ… Great for multiple pattern searches
โœ… Average time complexity: O(N + M)
โœ… Used in:
1.Plagiarism detection ๐Ÿง‘โ€๐ŸŽ“
2.Big text search engines
3.DNA sequence matching ๐Ÿงฌ
4.Intrusion detection systems ๐Ÿ”

๐Ÿงพ Dry Run: Letโ€™s See It in Action
๐Ÿ”ธ Input:
Text: "geeksforgeeks"
Pattern: "geek"

Step 1: Initialize Variables
M = 4 // length of pattern
N = 13 // length of text
d = 256 // characters in the alphabet
q = 101 // prime number for hashing

Step 2: Compute Hash Multiplier
h = pow(d, M-1) % q = (256^3) % 101 = 88

Step 3: Calculate Initial Hashes
p = hash("geek") % q = 27
t = hash("geek") % q = 27

Step 4: Slide the Window and Compare
Iteration 1: "geek" โ†’ Match found โœ… at index 1
Iteration 2: "eeks" โ†’ Hash โ‰  27 โ†’ skip โŒ
Iteration 3: "eksf" โ†’ Hash โ‰  27 โ†’ skip โŒ
...
Iteration 9: "geek" โ†’ Match found โœ… at index 9

๐Ÿง‘โ€๐Ÿ’ปFinal Result:
[1, 9]

๐Ÿ‘จโ€๐Ÿ’ป C++ Code: Rabin-Karp Algorithm

#include <iostream>
#include <vector>
using namespace std;

// Your Rabin-Karp search function
vector<int> search(string pattern, string text) {
    vector<int> result;
    int d = 256;   // Total characters (ASCII)
    int q = 101;   // Prime number to avoid collision
    int M = pattern.length();
    int N = text.length();
    int p = 0;     // hash for pattern
    int t = 0;     // hash for text window
    int h = 1;

    // Step 1: Calculate hash multiplier h = pow(d, M-1) % q
    for (int i = 0; i < M - 1; i++)
        h = (h * d) % q;

    // Step 2: Initial hash values
    for (int i = 0; i < M; i++) {
        p = (d * p + pattern[i]) % q;
        t = (d * t + text[i]) % q;
    }

    // Step 3: Slide pattern over text
    for (int i = 0; i <= N - M; i++) {
        // If hash values match, check characters one by one
        if (p == t) {
            bool match = true;
            for (int j = 0; j < M; j++) {
                if (text[i + j] != pattern[j]) {
                    match = false;
                    break;
                }
            }
            if (match)
                result.push_back(i + 1); // 1-based indexing
        }

        // Step 4: Update hash using rolling hash
        if (i < N - M) {
            t = (d * (t - text[i] * h) + text[i + M]) % q;
            if (t < 0)
                t = t + q; // Make sure hash stays positive
        }
    }

    return result;
}

// Main Function
int main() {
    string text = "ababcabcabababd";
    string pattern = "ababd";

    vector<int> positions = search(pattern, text);

    if (!positions.empty()) {
        cout << "Pattern found at position(s): ";
        for (int pos : positions)
            cout << pos << " ";
        cout << endl;
    } else {
        cout << "Pattern not found in the text." << endl;
    }

    return 0;
}


Enter fullscreen mode Exit fullscreen mode

๐Ÿง  Key Takeaways
Rabin-Karp is a hash-based string searching algorithm.
It uses a rolling hash to speed up window comparisons.
Perfect for situations where you need to find multiple patterns efficiently.

Top comments (1)

Collapse
 
dimpal_rajput profile image
Dimpal Rajput

Nice explanation!