Why Python's dict is Fast (and When It Isn't)
Python's built-in dict uses open addressing with a twist — it's not the simple linear or quadratic probing you see in textbooks. Yet most coding interview questions about hash tables still ask you to implement collision resolution from scratch, usually chaining with linked lists. I wanted to see the actual performance gap between these approaches, not just the theoretical complexity.
So I built both. A chaining-based hash table using Python lists (because linked lists in Python are painfully slow) and an open addressing table with linear probing. Then I threw 100k operations at them: inserts, lookups, deletes. The results weren't what I expected.
The Core Problem: Two Items, One Bucket
Hash collisions happen when two keys hash to the same index. If you're inserting ("apple", 5) and ("banana", 3) and both hash to index 7, you need a tiebreaker.
Chaining says: keep a list at each bucket. When collision happens, append to the list. Lookup becomes a scan through that list.
Open addressing says: if the slot is taken, probe for the next empty slot using some sequence (linear: i+1, i+2, ..., quadratic: i+1, i+4, i+9, ...). Lookup follows the same probe sequence until you find the key or hit an empty slot.
Continue reading the full article on TildAlice
Top comments (0)