DEV Community

Rhea Kapoor
Rhea Kapoor

Posted on

Pushing Billion-Scale Vector Search Beyond RAM Limits with DiskANN

The Memory Wall Problem

Most vector indexes prioritize RAM for low latency. HNSW, for example, achieves 95% recall at <5ms for 100M vectors but requires ~500GB RAM. At 1B vectors, RAM costs exceed $10k/month on cloud instances—prohibitively expensive for many teams. DiskANN flips this model:

  • Core Innovation: Stores graph structure and full vectors on SSDs
  • Memory Footprint: Requires 15–50× less RAM than HNSW
  • Tradeoff: Accepts ~10–20ms latency (NVMe SSDs) for billion-scale searches

In my 1B vector test (768-dim), DiskANN used 32GB RAM versus HNSW’s 512GB while maintaining 95%+ recall.

How DiskANN Breaks the Tradeoff

DiskANN’s performance stems from two key innovations:

1. Vamana Graph Construction

Unlike random graph initialization in HNSW, DiskANN uses a pruned, directional approach:

  • Step 1: Build an initial graph with random edges
  • Step 2: Execute greedy searches from medoid (center) point
  • Step 3: Prune neighbors lacking angular diversity (see Fig. 1)
  • Result: Fewer long-range hops → fewer SSD reads

My observation: Vamana reduced average search path length by 60% versus unpruned graphs, cutting SSD access.

2. Quantization-Guided Search

DiskANN combines two search phases:

  1. In-Memory (PQ-compressed vectors)
    • Fast approximate scoring
    • Identifies candidate nodes
  2. SSD (Full-precision vectors)
    • Validates candidates with exact distances
    • Final ranking

During tests, Phase 1 filtered 90% of nodes, reducing SSD reads to 10–20 per query.

Deployment Reality Check

Deploying DiskANN demands hardware awareness:

Factor Recommendation Impact if Ignored
SSD Type NVMe (≥1GB/s read) Latency spikes to >100ms with SATA SSDs
Memory ≥32GB for PQ data OOM crashes at query time
Graph Tuning max_degree=64, beamwidth_ratio=4 Recall drops below 90%

Sample Python Configuration:

index_params = {
    "index_type": "DISKANN",
    "params": {
        "metric_type": "IP",  # Inner Product
        "max_degree": 64,      # Max neighbors per node
        "beamwidth_ratio": 4.0 # Search breadth vs depth 
    }
}
client.create_index("my_collection", index_params)
Enter fullscreen mode Exit fullscreen mode

When Not to Use DiskANN

DiskANN underperforms in:

  • Small datasets (<50M vectors): HNSW’s RAM latency dominates
  • Write-heavy workloads: Index rebuilds take hours
  • Low-precision hardware: SATA SSDs add 5–10ms latency

FreshDiskANN partially addresses updates but adds 30% latency overhead in my tests.

Key Takeaways

  1. Cost Efficiency: DiskANN cuts memory costs 15× for billion-scale search
  2. Latency Budget: Budget 20ms per query (NVMe required)
  3. Optimal Workloads: Read-heavy, static datasets (e.g., historical archives)

What I’m Exploring Next

  • Integrating disk caching for warmer queries
  • Testing GraphANN (DiskANN’s successor) for dynamic data
  • Cold-start optimization for ephemeral instances

Final Thought

DiskANN isn’t a universal replacement but a specialized tool for extreme-scale search. Its SSD-centric design democratizes billion-vector workloads—provided engineers architect for its disk-bound nature. For teams with SATA SSDs or sub-millisecond requirements, HNSW/NSSG remain preferable.

Top comments (0)