I was lost... Until I met Hash Functions

#hashmap #algorithimsanddatastructures #python #longestsubstring

I had been sitting on a leetcode problem for hours. I felt stupid, lost, started to questing my existence and the meaning of life. Luck had it that I was reading "Grokking Algorithms", in my opinion, the best way to learn Algorithms and Data Structures. Link to the book.

With a look-up and insert time of O(1), this data structure will change your life. Trust me, its more amazing than your spouse😂. I Am In Love, and you will be too at the end of this post.

Given a string, find the length of the longest substring without repeating characters.
For example, the longest substring without repeating letters for “abcabcbb” is “abc”, which the length is 3. For “bbbbb” the longest substring is “b”, with the length of 1.

My Approach

Traverse the string each character at a time and form substrings. By maintaining an array max we can keep track of the start position and the length of the longest substring. We also maintain a count and a start position.

class LongestSubstr():
    def _(str):
        max = [0,0]
        start = 0
        count = 0

Problem - Keeping track of characters in the substring.

This is where the hash function aka hash map comes in. A hash function takes in a key and returns a value. This is how we define and initialize a hash map in python:

#hash map defn
class SubstringMap(dict):
    def __missing__(self, key):
    return -1;

#initialization
sm = SubstringMap()

Note: We have to define missing otherwise if we try to lookup a key that does not exist the hash map throws a KeyError. The return value of missing is up to you.

For every character(n) in the string, we check if a key=n exists in our hash map. If not, we add it with a value of an array of it's position and increment the count.

if(sm[str[n]] == -1):
    #adding key and value to hash map
    sm[str[n]] = n
    count = count + 1

If n exists, then we have a repeating character on our hands. This breaks the current substring. We store the start position and the count as substring length to max if the count is greater than the max.

if(max[1] < count): max = [start, count]

We need to figure out the new start position of the substring that includes our current element and satisfies our substring rules i.e. No repeating characters. We can achieve this by moving the start to the position immediately after the last occurrence of the current character, meaning that the current character is the only occurrence of itself in our current substring.

Note. For each character we store all its occurrences in an array, the character is the key and the array is the value. To get the last occurrence of a character we get the last element in the array.

Note: We first need to check of the last occurrence of our current character is in the current substring, i.e. falls under or after the current start position. If its under, then we don't need to move our start position, its after then we move the start to the index after.
Note: We also need to reduce the count by the number of elements we cut by moving the start position.

#if original falls under the start position
#dont move the start position
if(sm[str[n]] < start):
    # print("original ", str[n], "below start")
    sm[str[n]] = n
    count = count + 1
    # print("1",str[n], count)

#original comes after the start position
#move start position
else:
    #new substring start position
    start = sm[str[n]] + 1
    # print("new start @ ", start)
    #reduce count by the number of elements before duplicate
    count = n - sm[str[n]]
    #update duplicate position
    sm[str[n]] = n
    # print("2",str[n], count)

Top comments (1)

DiM • Feb 12 '23

That was interesting read. I was curious if it has to be really that complex you had to hack "__missing__".

And I dunno, it doesnt seems so.

def longest_substring(s: str) -> str:
    bucket = set()
    word = ""
    words = set()
    for char in s:
        if char not in bucket:
            bucket.add(char)
            word += char
        else:
            words.add(word)
            bucket = set()
            word = ""
    else:
        # last char
        words.add(word)

    longest = max(words, key=len)

    print(longest, words)
    return longest

But ye, interesting anyway.