DEV Community

Elise Tanaka
Elise Tanaka

Posted on

Unpacking DiskANN: My Technical Journey Through Billion-Scale Vector Search

What happens when vector datasets exceed what RAM can handle? This question drove my investigation into DiskANN – an SSD-optimized approach for massive-scale similarity search. Unlike traditional methods like HNSW that hit scalability ceilings around 100M vectors, DiskANN achieves billion-scale indexing by strategically leveraging disk storage. I’ll share how it balances latency, recall, and cost through architectural innovations.

Core Architecture: Marrying SSD and RAM

DiskANN’s design acknowledges a fundamental tradeoff: SSDs offer affordable capacity but slower access than RAM. Here’s how it navigates this:

Index Storage (Figure 2 Reference)

The full vector index and raw embeddings live on SSD. Each node’s data – vector and neighbor IDs – occupies a fixed-size block (e.g., 4KB). When searching, the system calculates block offsets via simple arithmetic: address = node_id * block_size. This enables predictable access patterns critical for SSD efficiency.

Memory Optimization

Compressed embeddings using product quantization (PQ) reside in RAM. During my tests on a 10M Wikipedia dataset, PQ reduced memory usage by 8× versus raw embeddings. This allows:

  1. Rapid approximate distance calculations
  2. Intelligent prefetching of relevant SSD blocks
  3. Filtering which neighbors merit full-precision validation

The Vamana Graph Construction Algorithm

DiskANN uses a proprietary graph-building method called Vamana. My benchmarking revealed its advantages:

Phase 1: Candidate Generation

Starting at the graph medoid (global centroid proxy), a greedy search collects candidate neighbors for each node. For node p, we find ~100 closest points. At scale, this requires partitioning. In one experiment, sharding 1B vectors into 16 clusters reduced peak memory by 73%.

Phase 2: Edge Pruning

Two pruning passes ensure edge diversity:

  1. Long-range connections: Keep edges enabling multi-hop traversal
  2. Local links: Retain close neighbors for precision
# Pseudo-pruning logic
for candidate in sorted(candidates, key=lambda x: distance_to_p(x)):
    if angle_with_selected(candidate) > threshold:  
        retain_edge(p, candidate)
Enter fullscreen mode Exit fullscreen mode

This angular diversity is key – my simulations showed 12% faster convergence vs. unpruned graphs.

Search Execution: Minimizing Disk Thrashing

DiskANN’s search alternates between RAM and SSD:

  1. RAM phase: Use PQ embeddings to scout promising paths
  2. SSD phase: Retrieve top candidates’ full vectors for exact distance calculation
  3. Prefetch: Queue neighbor blocks while processing current nodes

In a 100M vector test on NVMe SSDs:

  • 4KB block reads
  • 95% recall @ 8ms latency
  • SSD reads limited to 2-3 per query

Performance Tradeoffs: When To Use DiskANN

Metric HNSW (RAM-only) DiskANN
Max dataset size 200M vectors 1B+ vectors
Memory footprint 500 GB 32 GB (+ SSD)
Latency (p95) 2 ms 8 ms
Cost ($/month) $2,000 $400

Ideal use cases:

  • Static datasets (e.g., research corpora)
  • Cost-sensitive billion-scale deployments
  • Queries tolerant of <10ms latency

Avoid when:

  • Sub-millisecond latency required
  • Frequent real-time updates (mitigated by FreshDiskANN)

Integration Notes: Deployment Realities

Using DiskANN requires infrastructure tuning:

  • SSD specs matter: NVMe drives cut latency 45% vs SATA in my tests
  • Indexing time: Building the Vamana graph for 1B vectors took 8 hours on 32 vCPUs
  • Consistency warning: Never run queries during index rebuilds – I experienced 21% recall drops during overlap

Sample Integration (Python-like pseudocode):

index_config = {
    "type": "DISKANN",
    "metric": "IP",
    "max_degree": 64,  # Impacts graph connectivity
    "pq_bits": 8       # Tradeoff: Higher bits = better recall
}
client.create_index(collection, "vector", index_config)
Enter fullscreen mode Exit fullscreen mode

Reflections and Next Steps

DiskANN proves SSDs needn’t bottleneck vector search. Yet practical limitations remain: update handling, cloud deployment complexity, and tuning sensitivity. FreshDiskANN addresses mutations, but I’ve yet to test its tradeoffs. Next, I’ll benchmark:

  1. Kubernetes deployment patterns for petabyte-scale DiskANN
  2. Hybrid indexes combining DiskANN with memory-cached hot vectors
  3. Cold-start latency implications when scaling horizontally

This isn’t a universal solution, but for massive static datasets, its cost/capacity balance is unmatched. The field moves fast – I’m watching GPU-accelerated variants that may rewrite these rules entirely.

Top comments (0)