DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

HashMap vs Set: Dict Lookups Beat In-Memory Search by 47x

The 47x Performance Gap You're Ignoring

I ran a simple membership test on 100,000 integers. Linear search through a list took 2.1 seconds. Dictionary lookup? 45 milliseconds.

This isn't a marginal difference. It's the gap between a feature that works in production and one that times out under load. Yet I still see codebases doing if target in giant_list in hot paths, hemorrhaging CPU cycles because "it's simpler."

Here's what actually happens when you pick the wrong data structure for membership tests — and when the supposedly "slower" approach wins anyway.

Why Dict Lookups Are Fast (And When They're Not)

Python dictionaries use hash tables under the hood. The lookup process:

  1. Compute hash(key) — this is $O(1)$ for immutable types
  2. Use the hash to find the bucket index: index = hash(key) % table_size
  3. Check if the key exists at that index (handling collisions via open addressing)

The time complexity is $O(1)$ average case, $O(n)$ worst case if every key collides. But Python's hash function is good enough that collisions are rare.

Lists, by contrast, don't hash anything. Membership testing iterates from index 0 until it finds a match or reaches the end: $O(n)$ always.


python
import time
import random

# Generate test data
data_size = 100_000
test_data = list(range(data_size))
random.shuffle(test_data)

# Search targets: best case (first element), worst case (last), average (middle)

---

*Continue reading the full article on [TildAlice](https://tildalice.io/hashmap-vs-set-dict-lookup-performance-python/)*
Enter fullscreen mode Exit fullscreen mode

Top comments (0)