The Nuts and Bolts of HNSW: What Works, What Doesn’t, and Why I Care

I’ve spent months stress-testing vector search algorithms, and Hierarchical Navigable Small Worlds (HNSW) consistently stands out for mid-sized datasets. But it’s no silver bullet. Here’s what I’ve learned from implementing it, benchmarking trade-offs, and seeing it fail.

Why Naive Search Fails at Scale

Calculating Euclidean distances for all vectors works for tiny datasets. At 1 million 768-dim vectors, a naive Python scan takes ~1.2 seconds per query on an A100 GPU—unacceptable for real-time applications. This collapses completely beyond 10M vectors. Graph-based indices like HNSW reduce this to milliseconds, but introduce other constraints.

Navigable Small Worlds (NSW): Simple but Brittle

How NSW Builds Connections

Start with an empty graph.
For each new vector:
- Find R nearest neighbors in the existing graph (greedy search from a random entry point).
- Connect the vector to these neighbors.
Prune excess edges (default R=16).

Search Limitations I’ve Observed

In my tests on 10M GloVe vectors, NSW often got stuck in local minima. Starting from 10 random entry points improved recall@10 from 72% to 88%, but doubled latency. Worse, in low dimensions (e.g., 2D embeddings), NSW’s graph became entangled, causing 30% longer search paths.

HNSW’s Hierarchy Fixes NSW’s Flaws

HNSW adds layers to NSW. Each layer is a separate graph. Top layers (fewer nodes) allow long hops; bottom layers (all nodes) refine results.

Construction: A Top-Down Process

# Pseudocode for HNSW insertion  
def insert_vector(vector, max_layers=5):  
    layer = random_layer(max_layers)  # Truncated geometric distribution  
    entry_point = top_layer_entry  
    for current_layer in reversed(range(layer, max_layers)):  
        neighbors = greedy_search(vector, entry_point, ef=16, layer=current_layer)  
        entry_point = neighbors[0]  
    # Insert into all layers below 'layer'  
    for l in range(layer, -1, -1):  
        connect_to_neighbors(vector, l, max_edges=32)

Key parameters:

max_layers: Balances build time vs. search speed.
efConstruction: Trade recall for faster indexing (tested below).

Search: From Coarse to Fine

Start at top layer, find nearest neighbor to query.
Use this neighbor as entry point to the layer below.
Repeat until the bottom layer.

Benchmarks: Where HNSW Excels and Stumbles

I tested on 10M Cohere embeddings (768-dim), NVIDIA A100, efSearch=64:

Metric	NSW	HNSW (max_layers=5)
Avg. Latency	42ms	9ms
Recall@10	88%	98%
Build Time	18 min	34 min
Memory Overhead	12 GB	28 GB

When I’d Avoid HNSW:

Memory-bound systems: HNSW uses ~3–5x more RAM than PQ-based indices.
Static datasets: For read-heavy workloads, consider disk-optimized indices like DiskANN.
Ultra-high dimensions (>1K): HNSW’s recall drops below ANN alternatives like ScaNN.

Implementation Pitfalls I’ve Encountered

Edge Pruning: Not limiting edges during insertion (max_edges=32) bloated memory by 40%.
Layer Distribution: Skipping geometric sampling caused unbalanced graphs, increasing latency variance.
Hardware Mismatch: On CPUs, efSearch>128 often throttles throughput beyond 100 QPS.

Is HNSW Right for Your Stack?

Opt for HNSW when:

Your dataset fits in memory (≤100M vectors).
You need <20ms latency at high recall.
Index build time isn’t critical (e.g., batch updates nightly).

Avoid if:

You’re on embedded devices/low RAM.
Your vectors update in real-time (HNSW isn’t incremental).
Recall >99% is non-negotiable (brute-force still wins).

Open-source vector databases like Milvus use HNSW as a default for good reason—but always validate against your data distribution. I once saw a 20% latency spike on medical images vs. text embeddings due to clustered vector spaces.

What I’m Exploring Next

While HNSW dominates mid-scale search, I’m testing hybrid approaches:

Coupling HNSW with product quantization to cut memory.
Layer-free hierarchies for streaming data.
Failure mode analysis when vectors follow power-law distributions.

No algorithm is universally optimal. HNSW trades memory and build time for speed and recall. Measure twice, implement once.

DEV Community

The Nuts and Bolts of HNSW: What Works, What Doesn’t, and Why I Care

Top comments (0)