loading...

Hash Tables

jjb profile image JB Updated on ・2 min read

CS Level Up Series (30 Part Series)

1) CS Level Up Series Introduction 2) Dynamic Arrays 3 ... 28 3) Linked Lists 4) Stacks 5) Queues 6) Hash Tables 7) Binary Search Trees 8) Binary Heaps 9) Priority Queues 10) Graphs 11) Tries 12) Binary & Bit Manipulation 13) Common Sorting Algorithms 14) Searching Algorithms 15) Permutations, Combinations, & Subsets 16) NP-Complete & Fibonacci Heap 17) Detecting Graph Cycles With Depth-First Search 18) Finding Shortest Paths In Graphs (using Dijkstra's & BFS) 19) Topological Sorting of Directed Acyclic Graphs (DAGs) 20) Finding Articulation Points & Bridges in Undirected Graphs 21) Finding Strongly Connected Components in Directed Graphs using Tarjan's Algorithm 22) Checking If An Undirected Graph Is Bipartite 23) Extending An Iterator 24) Union-find (Disjoint-set) 25) Minimum Spanning Tree (Kruskal's Algorithm) 26) Sliding Window Technique 27) String Searching Using Rabin-Karp 28) Fenwick Tree (Binary Indexed Tree) 29) Dynamic Programming 30) Traveling Salesman Problem

Resources

  1. Hash table overview
  2. Hashing explanation
  3. Hash table explanation
  4. Hash table overview + implementation
  5. In depth explanation - warning this one is a long read (~60 mins). But there is too much good, approachable, info in there to not mention it.

Takeaways:

  • A hash table is essentially an array of buckets.
  • Each bucket contains a key value pair (KVP).
  • When inserting a KVP into a hash table, the key is hashed and modulo'd by the max size/capacity of the array. The resulting integer is the index of the array the KVP should be inserted at.
  • Hashing of keys can be done using a variety of hashing functions*.
  • No hashing function is perfect and collisions do happen.
  • Collisions are when we try to insert a KVP at a bucket that is not empty. This can happen when the same hash is generated for different keys. When the generated hash is the same, the resulting indexes are the same.
  • There are two ways to perform collision resolution: Separate Chaining and Open Addressing (also known as Closed Hashing).
  • Separate Chaining means that each bucket in our array is a Linked List. When a collision occurs the KVP gets added to the Linked List. This means multiple KVPs can reside at the same index.
  • Open Addressing uses a technique called probing to find the next available index to insert a KVP at. The order you search for an available bucket is the probe sequence. One type of probe sequence is linear probing - where we just increment the index by 1 until we find a free bucket in the array.
  • One thing that determines how likely collisions are is the Load Factor of a hash table's underlying array. Load factor is: Number of KVPs/Number of Buckets.
  • More robust hash table implementations will resize the underlying array when the Load Factor has reached a certain point. In my implementations I resize the arrays when the Load Factor reaches 0.75.
  • Searching, Inserting, and Deleting are all worst case O(n) (linear), but for each the amortized time is O(1) (constant). This is because collisions should be rare (if you use a good hashing function and keep the Load Factor below a certain threshold).
  • Space for hash tables is O(n).

*The most common hashing functions I came across were: DJB2, FNV-1a, and Murmur. FNV-1a is pretty simple, as is DJB2. As it turns out FNV-1a and Murmur are probably the best hashing functions to rely on. I personally like FNV-1a for how simple, yet effective, it is.

Below is the finished implementation of two Hash Tables (with test code). One is implemented using separate chaining and the other open addressing:

As always, if you found any errors in this post please let me know!

CS Level Up Series (30 Part Series)

1) CS Level Up Series Introduction 2) Dynamic Arrays 3 ... 28 3) Linked Lists 4) Stacks 5) Queues 6) Hash Tables 7) Binary Search Trees 8) Binary Heaps 9) Priority Queues 10) Graphs 11) Tries 12) Binary & Bit Manipulation 13) Common Sorting Algorithms 14) Searching Algorithms 15) Permutations, Combinations, & Subsets 16) NP-Complete & Fibonacci Heap 17) Detecting Graph Cycles With Depth-First Search 18) Finding Shortest Paths In Graphs (using Dijkstra's & BFS) 19) Topological Sorting of Directed Acyclic Graphs (DAGs) 20) Finding Articulation Points & Bridges in Undirected Graphs 21) Finding Strongly Connected Components in Directed Graphs using Tarjan's Algorithm 22) Checking If An Undirected Graph Is Bipartite 23) Extending An Iterator 24) Union-find (Disjoint-set) 25) Minimum Spanning Tree (Kruskal's Algorithm) 26) Sliding Window Technique 27) String Searching Using Rabin-Karp 28) Fenwick Tree (Binary Indexed Tree) 29) Dynamic Programming 30) Traveling Salesman Problem

Posted on Feb 11 by:

Discussion

markdown guide
 

Thank you so much for this series. The takeaways at the beginning are really helpful!