Rishabh Gupta

Posted on Jun 29 • Originally published at rishabh.fyi on Jun 30

The Algorithm Behind Every Vector Database Search — And Why It Matters for AI Engineers

#vectordatabases #rag #machinelearning #aiengineering

When I built AskS1.com — a tool that lets you ask questions about the SpaceX S-1 filing and get cited answers — I spent a lot of time thinking about retrieval. How do you find the right chunks of text from a 395-page document, fast enough that someone will actually wait for the answer?

Rishabh Gupta

Jun 10

How I Built a RAG System on the SpaceX S-1 in One Weekend

#aisystemdesign

7 min read

The answer turned out to be an algorithm I'd been using without fully understanding: HNSW — Hierarchical Navigable Small World graphs. It's the engine inside Qdrant, Weaviate, Pinecone, Milvus and most modern vector databases. If you're building anything with RAG, embeddings, or semantic search, HNSW is quietly doing the hardest part for you.

This post is what I wish I'd read before building AskS1.

What's a Vector Database, and Why Do You Need One?

Before HNSW makes sense, you need the problem it solves.

Modern AI applications — RAG systems, semantic search, recommendation engines — work by converting text (or images, or audio) into vectors: arrays of floating-point numbers that represent meaning. Two pieces of text that mean similar things will have vectors that are numerically close to each other. "Starlink revenue in 2025" and "Connectivity segment financial results" will be neighbors in vector space even though they share no words.

A vector database stores these vectors and answers one question efficiently: given a query vector, which stored vectors are most similar? That's the retrieval step in RAG — embed the user's question, find the most similar chunks, feed them to the language model.

The naive approach is obvious: compare the query vector against every stored vector, rank by similarity, return the top K. This works fine at 1,000 vectors. At 1,000,000 vectors, it's too slow. At 100,000,000 vectors (the scale of production recommendation systems), it's completely infeasible.

This is the problem HNSW was designed to solve.

The Algorithm: How HNSW Actually Works

The original HNSW paper was published in 2016 by Malkov and Yashunin. The core idea is elegant enough to explain in three paragraphs.

The amber path shows HNSW search: enter at Layer 2 (A→B), drop to Layer 1 (B→E), drop to Layer 0, find the nearest neighbor (★) to the query vector (Q).

The structure. HNSW builds a multi-layer graph over your stored vectors. Each vector is a node. Nodes are connected to their nearest neighbors, but the connections are separated by scale across layers:

Layer 2 (top) — sparse, long-range connections — coarse "zoom out"
Layer 1 — medium-range connections
Layer 0 (base) — dense, short-range connections — full precision

Only a small fraction of vectors appear at the top layers. Every vector appears at Layer 0.

The search. When a query arrives, search starts at the top layer. The algorithm greedily hops toward the query — at each step, it moves to whichever neighbor is closest to the query vector. When it can't get any closer (local minimum), it drops to the next layer and repeats, starting from where it stopped. By the time it reaches Layer 0, it's already in the right neighborhood and finds the true nearest neighbors quickly.

Why this is fast. Without the hierarchy, you'd need to scan many nodes to find the right neighborhood. The hierarchy acts like a map zoom: start at country level to find the right region, zoom to city level to find the right neighborhood, then zoom to street level to find the exact address. Each zoom-in starts from a much better position than random.

The result: logarithmic complexity — O(log n) search instead of O(n). At 1 million vectors, that's roughly 20 hops instead of 1,000,000 comparisons.

The Name Unpacked

"Hierarchical Navigable Small World" is a mouthful. Each word earns its place:

Hierarchical — the multi-layer structure that gives it logarithmic complexity. Without this, you get NSW (the predecessor algorithm), which is only polylogarithmic — still too slow at scale.

Navigable — greedy routing through the graph converges to the right answer. Not all graphs have this property. The specific way HNSW constructs edges ensures that following the "closest neighbor at each step" rule actually leads you somewhere useful.

Small World — any two nodes in the graph can be reached from each other in a small number of hops, regardless of graph size. This is the same "six degrees of separation" phenomenon studied in social network theory — Milgram's famous experiment. HNSW deliberately engineers this property into its graph structure.

Why Previous Approaches Failed

It helps to understand what HNSW replaced:

Brute-force / flat search — compare the query against every vector. O(n) — too slow at scale. Still used for tiny collections where speed doesn't matter.

kd-trees — the classic algorithm for nearest neighbor search in low-dimensional spaces. Works well up to maybe 20 dimensions. Above that, the "curse of dimensionality" kicks in: the tree structure degrades and you end up scanning most of the tree anyway. Modern embedding models produce 384 to 1536-dimensional vectors — kd-trees are useless here.

Locality-sensitive hashing (LSH) — hash similar vectors to the same bucket, search within the bucket. Works, but requires tuning many parameters and tends to need high memory for good recall.

NSW (non-hierarchical) — the direct predecessor to HNSW. Good idea, but polylogarithmic complexity: as the dataset grows, each search requires evaluating an increasingly large number of nodes. HNSW's hierarchy adds a second log factor that brings it to true logarithmic scaling.

The Parameters You'll Actually Configure

When you use a vector database like Qdrant, you don't implement HNSW — you configure it. Three parameters matter:

M — the number of connections per node per layer. Higher M means better recall (more neighbors to navigate through) at the cost of more memory and slower index build time. Qdrant's default is 16, which works well for most embedding dimensions.

efConstruction — the candidate list size during index building. Higher values build a better quality index at the cost of build time. Default is 100.

ef — the candidate list size during search. This is the one you actually tune at query time. Higher ef means HNSW explores more candidates before returning results — better recall, slightly slower. If your RAG system is missing relevant chunks, increasing ef (or the equivalent limit parameter in your vector database client) is the first thing to try.

In AskS1, the retrieve() function passes limit=15 to Qdrant's query_points(). HNSW finds 15 candidates, then a re-ranking step applies a penalty to summary pages and returns the top 5. Increasing the limit to 20-25 would give HNSW more candidates to work with — potentially improving the quality of retrieved chunks at marginal latency cost for an 871-chunk collection.

One More Idea Worth Understanding: The Neighbor Selection Heuristic

The paper introduces a heuristic for selecting which nodes to connect during index construction that's worth understanding if you're working with clustered data.

The naive approach connects each new node to its M closest existing neighbors. This works most of the time, but fails on highly clustered data: if all your nearest neighbors are in the same cluster, you have no long-range connections to other clusters. The retriever gets stuck.

HNSW's heuristic (Figure 2 in the paper, also shown below) deliberately selects diverse neighbors — it prefers candidates that extend connectivity in new directions, even if they're not the absolute closest. The result is a graph that maintains global connectivity across clusters.

This matters for RAG specifically. In a filing like the SpaceX S-1, chunks about "Starlink revenue" form a dense cluster. So do chunks about "governance" and "risk factors." Without cross-cluster connectivity, a query about "Elon Musk's voting power and its revenue implications" might only retrieve governance chunks, missing the revenue context entirely. HNSW's heuristic makes cross-cluster retrieval work.

Reading the Paper

The HNSW paper is accessible without a deep algorithms background if you read it selectively. The sections worth your time:

Abstract — the entire algorithm in 15 lines
Section 1 — why naive search fails; no math required
Section 3 — the zoom-out/zoom-in intuition; the best explanatory section
Figure 1 — the layered structure, visually
Figure 2 — the neighbor selection heuristic
Algorithm 5 — the actual search procedure, only 8 lines
Section 4.1 — what M, mL, and efConstruction actually control

Skip Section 2 (prior work survey), Algorithms 1-4 (implementation detail), and the experiments section (the finding is just "HNSW wins"). The math-heavy parts aren't necessary for understanding how to use it.

Total reading time at this depth: 45-60 minutes.

Why This Matters for AI Engineers

Vector databases are now a standard component in AI engineering — RAG pipelines, semantic search, recommendation systems, and anything using embeddings routes through one. Understanding HNSW doesn't mean you'll implement it (you won't — Qdrant, Weaviate, and others handle that). But it tells you:

Why limit in your vector database query is actually a recall parameter, not just a count
Why retrieval quality degrades on clustered data and what to do about it
Why building a 10-million-vector index takes a long time but search is fast
What tradeoffs you're making when you adjust M and efConstruction

The algorithm was published in 2016. It's been powering production systems for almost a decade. If you're building with vector databases in 2026, it's worth spending an hour understanding what's actually happening when you call query_points().

DEV Community