Unveiling the Magic of Hash Tables: Your Ultimate Guide to Efficient Data Lookup PT. 1

#typescript #hashtables #datastructures #algorithms

Hello, algorithm and data structure enthusiasts. Today we are going to talk about the misunderstood Hash Tables.

Hash Tables are a misunderstood data structure because we use them in our day-to-day activities, but rarely realize we are using them. A common example is the contacts list in our phones.
So, let's dive deep into this data structure.

What the Heck Is a Hash Table and Why Should You Care?

Data structure that maps keys to values for highly efficient lookup.

A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

As a software engineer, hash tables are worth your attention because they have numerous applications in our field. As I mentioned earlier, we use them extensively in our daily work. For example:

Implementation of caches to improve application performance
Storage of key-value pairs in NoSQL databases (MongoDB, Redis)
Detection of duplicates in datasets
Implementation of dictionaries and maps in programming languages
Indexing in search engines
Management of user sessions in web applications
Symbol tables in compilers and interpreters
In-memory storage of application configurations
Implementation of efficient sets
Frequency counting (such as word analysis in texts as the last exercise)
Routing tables in networks and distributed systems
Password verification (hash storage)

Creating Your Own Hash Table

For me, the best way to learn new things is by doing them myself. So, let's create our own hash table implementation with some Typescript.

To achieve our desired result, we are going to take advantage of generics in TypeScript. Let's begin by creating a new class with two generic types.

export class HashTable<K, V> {}

This approach allows us to accept any type for both our key and value parameters, which is precisely why we're using two generic types. This way our key can be a number, string, or even an object (though not recommended), and the same applies to our value. As you can see, we are going to map keys to values.

If you read my last post, you might remember we already did something similar. Remember when we mapped each character with its occurrences? Well, that was a hash map at work. However, that time we used the built-in Map class. Today, we're creating our own "Map-like" class.
Although hash tables and hash maps work differently, they follow the same principle of hashing a key and mapping it to a value. You can see a better comparison in this video: https://youtube.com/shorts/2UiA8T9qj_w

Building the constructor

We are going to store our values in buckets, where each key may correspond to different buckets.

Think of a hash table as an apartment building. The key is like the building address, each bucket is a floor, and the apartments on each floor represent our stored values.

Now that we understand the concept, let's create our table object.

private table: Array<Array<[K, V]>>;

constructor(private size: number) {
  this.table = new Array(size).fill(null).map(() => []);
}

We've just created an array of arrays, where each inner array contains key-value tuples. This structure has a fixed size that's determined when the hash table is instantiated. Each bucket is an array that stores multiple entries when collisions occur. We will explain this concept in more detail when creating the hash function.

Explaining the Hash Function

A hash function is a function that takes an input (or key) and returns a fixed-size string of bytes. The output is typically a digest that uniquely represents the input data. Hash functions are used in hash tables to compute the index for storing and retrieving values.

A good hash function minimizes the number of collisions (when two keys hash to the same index) and distributes keys uniformly across the hash table.

Now, let's create our hash function. It will receive the key (a generic of K type) and return a number (the digest).

private hash(key: K): number {}

Using the DJB2 Algorithm to hash our key

Next, we'll implement the DJB2 algorithm to hash our key. To start, we need to create a hash variable with an initial value of 5381 and convert the key to a string.

let hash = 5381;
const strKey = String(key);

Developed by Daniel J. Bernstein, this algorithm is an easy but effective way to hash strings. It works by processing each character in the string and updating the initial hash value 5381. In each iteration, the initial hash value is multiplied by 33 and then added to the ASCII value of the character. This generates our digest which we use to assign our key. Simple, right?

Now, let's loop through our strKey variable to process each character using the good old for…of loop:

for(const char of strKey){}

In each iteration, we need to calculate our digest. First, we'll get the character code using the built-in charCodeAt function. Then, we'll multiply our hash value by 33 and add the character code to it.

const charCode = strKey.charCodeAt(i);
hash = (hash * 33) + charCode;

Finally, when the loop ends, we need to obtain the absolute value of our hash variable to prevent negative keys and calculate the modular value of this hash relative to the size of our table.

return Math.abs(hash) % this.size;

Here's the complete implementation of our hash function:

private hash(key: K): number {
    let hash = 5381;
    const strKey = String(key);

    for (let i = 0; i < strKey.length; i++) {
        const char = strKey.charCodeAt(i);
        hash = (hash * 33) + char;
    }

    return Math.abs(hash) % this.size;
}

Conclusion

In this post, we've explored the fundamentals of hash tables and started implementing our own using TypeScript. We've learned what makes hash tables so powerful and efficient for lookup operations, and we've implemented the critical hash function using the DJB2 algorithm.

In the next part of this series, we'll build on our implementation by adding the essential methods to our HashTable class: set(), get(), remove(), and has(). We'll also explore how to handle collisions effectively and discuss the time and space complexity implications of our implementation choices.

Until then, I encourage you to experiment with the code we've written so far and think about how you might use hash tables in your own projects. Happy coding!