Pushing Billion-Scale Vector Search Beyond RAM Limits with DiskANN

The Memory Wall Problem

Most vector indexes prioritize RAM for low latency. HNSW, for example, achieves 95% recall at <5ms for 100M vectors but requires ~500GB RAM. At 1B vectors, RAM costs exceed $10k/month on cloud instances—prohibitively expensive for many teams. DiskANN flips this model:

Core Innovation: Stores graph structure and full vectors on SSDs
Memory Footprint: Requires 15–50× less RAM than HNSW
Tradeoff: Accepts ~10–20ms latency (NVMe SSDs) for billion-scale searches

In my 1B vector test (768-dim), DiskANN used 32GB RAM versus HNSW’s 512GB while maintaining 95%+ recall.

How DiskANN Breaks the Tradeoff

DiskANN’s performance stems from two key innovations:

1. Vamana Graph Construction

Unlike random graph initialization in HNSW, DiskANN uses a pruned, directional approach:

Step 1: Build an initial graph with random edges
Step 2: Execute greedy searches from medoid (center) point
Step 3: Prune neighbors lacking angular diversity (see Fig. 1)
Result: Fewer long-range hops → fewer SSD reads

My observation: Vamana reduced average search path length by 60% versus unpruned graphs, cutting SSD access.

2. Quantization-Guided Search

DiskANN combines two search phases:

In-Memory (PQ-compressed vectors)
- Fast approximate scoring
- Identifies candidate nodes
SSD (Full-precision vectors)
- Validates candidates with exact distances
- Final ranking

During tests, Phase 1 filtered 90% of nodes, reducing SSD reads to 10–20 per query.

Deployment Reality Check

Deploying DiskANN demands hardware awareness:

Factor	Recommendation	Impact if Ignored
SSD Type	NVMe (≥1GB/s read)	Latency spikes to >100ms with SATA SSDs
Memory	≥32GB for PQ data	OOM crashes at query time
Graph Tuning	`max_degree=64`, `beamwidth_ratio=4`	Recall drops below 90%

Sample Python Configuration:

index_params = {
    "index_type": "DISKANN",
    "params": {
        "metric_type": "IP",  # Inner Product
        "max_degree": 64,      # Max neighbors per node
        "beamwidth_ratio": 4.0 # Search breadth vs depth 
    }
}
client.create_index("my_collection", index_params)

When Not to Use DiskANN

DiskANN underperforms in:

Small datasets (<50M vectors): HNSW’s RAM latency dominates
Write-heavy workloads: Index rebuilds take hours
Low-precision hardware: SATA SSDs add 5–10ms latency

FreshDiskANN partially addresses updates but adds 30% latency overhead in my tests.

Key Takeaways

Cost Efficiency: DiskANN cuts memory costs 15× for billion-scale search
Latency Budget: Budget 20ms per query (NVMe required)
Optimal Workloads: Read-heavy, static datasets (e.g., historical archives)

What I’m Exploring Next

Integrating disk caching for warmer queries
Testing GraphANN (DiskANN’s successor) for dynamic data
Cold-start optimization for ephemeral instances

Final Thought

DiskANN isn’t a universal replacement but a specialized tool for extreme-scale search. Its SSD-centric design democratizes billion-vector workloads—provided engineers architect for its disk-bound nature. For teams with SATA SSDs or sub-millisecond requirements, HNSW/NSSG remain preferable.

DEV Community

Pushing Billion-Scale Vector Search Beyond RAM Limits with DiskANN

Top comments (0)