Hash function
It's a function that can map any kind of data of arbitrary size to fixed-size values, these values are called Hash Values
In our case we will talk about Rolling Hash Function that is used in Rabin-Karp Algorithm
Rolling Hash Function:
Recap:
In Rabin Karp Algorithm, we used to compare the hash values of the pattern and the substring to find the match, by using Rollling Hash Function, we can prevent the rehashing of the whole substring by only using the old hash value.
- For simplicity, I used incremental values starting from "1" to encode the english alphabet, we can use ASCII or any other encoding.
- In our next iterations, we will use the old hash value, subtract the letter that is no longer inside our window and add the newly added one.
Tips to have a good rolling hash function:
1. Prevent Frequent Collisions:
We can have two equal hash values for two different strings, like having same letters in different order for example.
To prevent collisions we can multiply each character with a constant raised to power of its position.
- For simplicity, I chose 10 as our constant raised to power of (n - 1) where n is the position of character from the right.
- Same as the previous algorithm except that we are sure that order of characters will definitely affect our hash values.
To prevent hash value collisions, choose a constant that is
greater than the largest value in the alphabet.
And here comes tip #2
2. Prevent Integer overflows:
We will add to our formula (% m), where m is a larger prime number, since the probability of two random strings colliding is about ≈ 1/m.
Top comments (0)