DEV Community

kartikay dubey
kartikay dubey

Posted on • Originally published at dubeykartikay.com

Why You Should Never Use std::unordered_set in Hot C++ Loops

Hash tables feel like the default choice for membership tests. std::unordered_set promises average O(1) lookup, so we reach for it automatically. In performance-sensitive C++ code, that habit can cost you an order of magnitude.

I ran into this while building a Vamana graph index for approximate nearest neighbor search. The algorithm needs to track visited nodes. Node ids are dense integers, and the visited check runs inside the hottest loop in the entire search path.

My first implementation used std::unordered_set<uint32_t>. It was correct, and it was slow.

What the benchmark says

I generated 1000 vectors of random uint32_t ids and deduplicated them using three approaches: std::unordered_set, sort + unique, and boost::dynamic_bitset<>. For dense ids sampled from [0, 2n), the numbers were brutal:

n unordered_set ms sort+unique ms boost bitset ms
128 5 3 1
32,768 1,649 1,455 177
500,000 50,302 26,759 3,423

At n = 500,000, the bitset was 14.7x faster. The hash table had to hash keys, grow buckets, rehash, and chase pointers through memory. The bitset did one indexed memory operation.

sort + unique also beat the hash table at scale because it walks contiguous memory, and CPUs love that.

When the hash table wins

Sparse ids change the picture. When I sampled only n ids from a universe of 100,000,000 possible values, the bitset had to clear a massive mostly-empty array before every vector:

n unordered_set ms boost bitset ms
128 6.3 149.7
2,048 91.9 145.5
65,536 4,169.3 985.4

For small sparse inputs, std::unordered_set is genuinely better. The bitset only pulls ahead once the input is large enough to amortize the fixed clearing cost.

The practical rule

Reach for std::unordered_set when ids are sparse, unbounded, or not integer-indexable. When ids are dense integers inside a hot loop, make the membership check an indexed load or store instead.

The CPU does not care about your Big-O notation. It cares about memory access patterns.

I wrote a longer post with the full methodology, assembly-level analysis, and raw CSV data: Why You Should Never Use a set.

Top comments (0)