Resources
- Hash table overview
- Hashing explanation
- Hash table explanation
- Hash table overview + implementation
- In depth explanation - warning this one is a long read (~60 mins). But there is too much good, approachable, info in there to not mention it.
Takeaways:
- A hash table is essentially an array of buckets.
- Each bucket contains a key value pair (KVP).
- When inserting a KVP into a hash table, the key is hashed and modulo'd by the max size/capacity of the array. The resulting integer is the index of the array the KVP should be inserted at.
- Hashing of keys can be done using a variety of hashing functions*.
- No hashing function is perfect and collisions do happen.
- Collisions are when we try to insert a KVP at a bucket that is not empty. This can happen when the same hash is generated for different keys. When the generated hash is the same, the resulting indexes are the same.
- There are two ways to perform collision resolution: Separate Chaining and Open Addressing (also known as Closed Hashing).
- Separate Chaining means that each bucket in our array is a Linked List. When a collision occurs the KVP gets added to the Linked List. This means multiple KVPs can reside at the same index.
- Open Addressing uses a technique called probing to find the next available index to insert a KVP at. The order you search for an available bucket is the probe sequence. One type of probe sequence is linear probing - where we just increment the index by 1 until we find a free bucket in the array.
- One thing that determines how likely collisions are is the Load Factor of a hash table's underlying array. Load factor is: Number of KVPs/Number of Buckets.
- More robust hash table implementations will resize the underlying array when the Load Factor has reached a certain point. In my implementations I resize the arrays when the Load Factor reaches 0.75.
- Searching, Inserting, and Deleting are all worst case
O(n)
(linear), but for each the amortized time isO(1)
(constant). This is because collisions should be rare (if you use a good hashing function and keep the Load Factor below a certain threshold). - Space for hash tables is
O(n)
.
*The most common hashing functions I came across were: DJB2, FNV-1a, and Murmur. FNV-1a is pretty simple, as is DJB2. As it turns out FNV-1a and Murmur are probably the best hashing functions to rely on. I personally like FNV-1a for how simple, yet effective, it is.
Below is the finished implementation of two Hash Tables (with test code). One is implemented using separate chaining and the other open addressing:
As always, if you found any errors in this post please let me know!
Discussion (1)
Thank you so much for this series. The takeaways at the beginning are really helpful!