DEV Community

Cover image for Rolling Hash Function
Hagar
Hagar

Posted on

Rolling Hash Function

Hash function

It's a function that can map any kind of data of arbitrary size to fixed-size values, these values are called Hash Values

In our case we will talk about Rolling Hash Function that is used in Rabin-Karp Algorithm

Rolling Hash Function:

Recap:

In Rabin Karp Algorithm, we used to compare the hash values of the pattern and the substring to find the match, by using Rollling Hash Function, we can prevent the rehashing of the whole substring by only using the old hash value.

Img1

  • For simplicity, I used incremental values starting from "1" to encode the english alphabet, we can use ASCII or any other encoding.

Img2

  • In our next iterations, we will use the old hash value, subtract the letter that is no longer inside our window and add the newly added one.

Img3

Tips to have a good rolling hash function:

1. Prevent Frequent Collisions:

We can have two equal hash values for two different strings, like having same letters in different order for example.

To prevent collisions we can multiply each character with a constant raised to power of its position.

Img1

  • For simplicity, I chose 10 as our constant raised to power of (n - 1) where n is the position of character from the right. Img2
  • Same as the previous algorithm except that we are sure that order of characters will definitely affect our hash values. Img3

To prevent hash value collisions, choose a constant that is
greater than the largest value in the alphabet.

And here comes tip #2

2. Prevent Integer overflows:

We will add to our formula (% m), where m is a larger prime number, since the probability of two random strings colliding is about ≈ 1/m.

Img1

Img2

Img3

Top comments (0)