I Built a Vector Search Engine from Scratch — Here's What I Learned
Implementing HNSW (Hierarchical Navigable Small World) graphs, hybrid BM25 + dense retrieval, HyDE query rewriting, and atomic index persistence — achieving recall@10 = 0.984.
Why Build Your Own Vector Search?
When I started building Vektr — a RAG (Retrieval-Augmented Generation) engine — I had a choice: use an existing vector database like Pinecone, Weaviate, or FAISS, or build my own.
I chose to build my own. Not because existing solutions are bad (they're excellent), but because you don't truly understand a system until you've built it.
This post is about what I learned building HNSW from scratch.
What is HNSW?
HNSW (Hierarchical Navigable Small World) is the algorithm powering most modern vector databases. It achieves near-linear search time with high recall by organizing vectors into a hierarchical graph.
The key insight: approximate nearest neighbor search is fast enough, and "approximate" is closer to exact than you'd think.
My implementation achieves recall@10 = 0.984 — meaning for 98.4% of queries, all 10 true nearest neighbors appear in the top 10 results.
The HNSW Structure
Layer 2 (sparse): 1 ──────────── 5
│ │
Layer 1 (medium): 1 ── 3 ── 4 ── 5
│ │ │ │
Layer 0 (dense): 1─2─3─4─5─6─7─8─9
Each vector is inserted at layer 0. With probability 1/ln(M), it also appears in layer 1, and so on. This creates a highway network — you navigate quickly through sparse upper layers, then zoom in at the dense bottom layer.
Building the Index
public class HNSWIndex {
private final int M; // Max connections per node
private final int efConstruction; // Search width during construction
private final int maxLayer;
private final Map<Integer, Node> nodes;
private final Random random;
private Node entryPoint;
public void insert(int id, float[] vector) {
int level = getRandomLevel();
Node newNode = new Node(id, vector, level);
if (entryPoint == null) {
entryPoint = newNode;
nodes.put(id, newNode);
return;
}
// Start from entry point, navigate down to insertion level
Node current = entryPoint;
for (int l = entryPoint.level; l > level; l--) {
current = greedySearch(current, vector, 1, l).get(0);
}
// Insert at each layer from level down to 0
for (int l = Math.min(level, entryPoint.level); l >= 0; l--) {
List<Node> candidates = searchLayer(current, vector, efConstruction, l);
List<Node> neighbors = selectNeighbors(candidates, M, vector);
newNode.setConnections(l, neighbors);
// Add backlinks
for (Node neighbor : neighbors) {
neighbor.addConnection(l, newNode);
// Prune if over capacity
if (neighbor.getConnections(l).size() > M) {
List<Node> pruned = selectNeighbors(
neighbor.getConnections(l), M, neighbor.vector
);
neighbor.setConnections(l, pruned);
}
}
}
nodes.put(id, newNode);
if (level > entryPoint.level) {
entryPoint = newNode;
}
}
private int getRandomLevel() {
// Level distribution: P(level = l) = (1/ln(M))^l
double r = -Math.log(random.nextDouble()) * (1.0 / Math.log(M));
return (int) Math.min(r, maxLayer);
}
}
Searching the Index
public List<SearchResult> search(float[] query, int k, int ef) {
// Navigate from entry point down to layer 1
Node current = entryPoint;
for (int l = entryPoint.level; l > 0; l--) {
current = greedySearch(current, query, 1, l).get(0);
}
// Beam search at layer 0 with ef candidates
List<Node> candidates = searchLayer(current, query, ef, 0);
// Return top-k by distance
return candidates.stream()
.sorted(Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector)))
.limit(k)
.map(n -> new SearchResult(n.id, cosineSimilarity(query, n.vector)))
.collect(Collectors.toList());
}
private List<Node> searchLayer(Node entry, float[] query, int ef, int layer) {
Set<Node> visited = new HashSet<>();
PriorityQueue<Node> candidates = new PriorityQueue<>(
Comparator.comparingDouble(n -> -cosineSimilarity(query, n.vector))
);
PriorityQueue<Node> results = new PriorityQueue<>(
Comparator.comparingDouble(n -> cosineSimilarity(query, n.vector))
);
candidates.add(entry);
results.add(entry);
visited.add(entry);
while (!candidates.isEmpty()) {
Node candidate = candidates.poll();
// Termination condition: best candidate is worse than worst result
if (results.size() >= ef &&
cosineSimilarity(query, candidate.vector) <
cosineSimilarity(query, results.peek().vector)) {
break;
}
for (Node neighbor : candidate.getConnections(layer)) {
if (!visited.contains(neighbor)) {
visited.add(neighbor);
candidates.add(neighbor);
results.add(neighbor);
if (results.size() > ef) results.poll();
}
}
}
return new ArrayList<>(results);
}
Hybrid Retrieval: BM25 + Dense Search
Pure vector search misses exact keyword matches. Pure BM25 misses semantic similarity. The solution: combine both.
public List<SearchResult> hybridSearch(String query, int k) {
// Dense retrieval
float[] queryEmbedding = embedder.embed(query);
List<SearchResult> denseResults = index.search(queryEmbedding, k * 2, efSearch);
// Sparse retrieval (BM25)
List<SearchResult> sparseResults = bm25.search(query, k * 2);
// Reciprocal Rank Fusion
return reciprocalRankFusion(denseResults, sparseResults, k);
}
private List<SearchResult> reciprocalRankFusion(
List<SearchResult> dense,
List<SearchResult> sparse,
int k
) {
Map<Integer, Double> scores = new HashMap<>();
int k_rrf = 60; // RRF constant
// Dense scores
for (int i = 0; i < dense.size(); i++) {
int id = dense.get(i).id;
scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);
}
// Sparse scores
for (int i = 0; i < sparse.size(); i++) {
int id = sparse.get(i).id;
scores.merge(id, 1.0 / (k_rrf + i + 1), Double::sum);
}
return scores.entrySet().stream()
.sorted(Map.Entry.<Integer, Double>comparingByValue().reversed())
.limit(k)
.map(e -> new SearchResult(e.getKey(), e.getValue()))
.collect(Collectors.toList());
}
RRF (Reciprocal Rank Fusion) is elegant: each result gets a score of 1 / (k + rank) from each retriever. Results appearing in both lists get combined scores, naturally surfacing the best matches.
HyDE: Hypothetical Document Embeddings
Query: "What is the capital of France?"
The problem: this query, embedded, looks nothing like a Wikipedia article about Paris. Dense retrieval fails.
HyDE solution: Generate a hypothetical answer first, then embed that.
public float[] hydeEmbed(String query) {
// Generate hypothetical answer
String hypothetical = llm.generate(
"Write a short factual answer to: " + query
);
// Embed the hypothetical answer instead of the query
return embedder.embed(hypothetical);
}
Query: "What is the capital of France?"
Hypothetical: "The capital of France is Paris, located in northern France along the Seine River..."
Now the embedding actually matches relevant documents.
Impact: +8% recall@10 on my test set.
Atomic Index Persistence
The naive approach to saving the index:
// DANGEROUS — if the process dies here, the file is corrupted
try (FileOutputStream fos = new FileOutputStream("index.bin")) {
serialize(index, fos);
}
The safe approach — write-to-tmp + rename (atomic on POSIX systems):
public void saveIndex() throws IOException {
Path tempFile = Files.createTempFile("index-", ".tmp");
try {
// Write to temp file
try (ObjectOutputStream oos = new ObjectOutputStream(
new BufferedOutputStream(Files.newOutputStream(tempFile))
)) {
oos.writeObject(this.nodes);
oos.writeObject(this.entryPoint);
}
// Atomic rename — either succeeds completely or fails completely
Files.move(tempFile, indexPath,
StandardCopyOption.ATOMIC_MOVE,
StandardCopyOption.REPLACE_EXISTING
);
} catch (Exception e) {
Files.deleteIfExists(tempFile);
throw e;
}
}
ATOMIC_MOVE is a single filesystem operation — it either completes or doesn't happen at all. No corrupted state.
Result: Index loads in <15ms on restart, matching LevelDB's durability pattern.
Performance Results
Tested on 1,000 vectors (sentence embeddings, 384 dimensions):
| Metric | Result |
|---|---|
| recall@10 | 0.984 |
| Cold query latency | 35ms |
| Cached query latency | <1ms |
| Index load time | <15ms |
| Index build time (1K vectors) | ~200ms |
The cold vs cached gap shows the LRU cache working: 35ms first query, sub-millisecond repeat queries.
Key Lessons
1. The probabilistic layer structure is brilliant.
O(log n) search complexity comes naturally from the exponential decay of upper layers. You don't need a balanced tree — randomness does the work.
2. ef and M are the critical parameters.
-
M: max connections per node. Higher = better recall, more memory. -
efConstruction: search width during insertion. Higher = better index quality, slower build. -
efSearch: search width at query time. Higher = better recall, slower queries.
3. Hybrid retrieval almost always beats pure dense retrieval.
BM25 catches exact matches that dense embeddings miss. RRF fusion requires no tuning.
4. Atomic writes are non-negotiable for any persistent data structure.
Write-to-tmp + rename is the standard pattern — use it everywhere.
5. HyDE is underrated.
Generating a hypothetical answer before embedding significantly improves recall for factoid queries with minimal overhead.
Source Code
I'm a 3rd year CS student at MJCET, Hyderabad — building distributed systems from scratch.
Top comments (0)