Vector Databases and Semantic Search: A Practical Introduction

#vectordb #semanticsearch #ann #machinelearning

Traditional search engines match keywords. If you search for "dog shelters around Gurgaon" and the indexed page says "animal shelters near Delhi," you get no results. The words do not overlap.

Semantic search fixes this by converting text into vectors. Similar ideas end up close together in vector space, even when the words differ.

From words to vectors

An embedding model takes a word or sentence and produces a high-dimensional vector. The key property: semantically similar inputs produce vectors that are close to each other. "Dog" and "animal" sit near each other. "Dog" and "car" do not.

For a search engine, the pipeline is straightforward:

Convert every document in the corpus into a vector and store it.
Convert the user's query into a vector using the same model.
Find the documents whose vectors are closest to the query vector.

The hard part is step 3. A corpus of a million documents with 768-dimensional vectors is a massive dataset. Computing the exact distance from the query to every document is too slow for interactive search.

Approximate Nearest Neighbors

Exact search is O(n). ANN algorithms trade a small amount of accuracy for massive speedups. The metric is recall@k: out of the true k closest vectors, how many does the approximation find? A 95% recall@100 means 95 of the 100 true nearest neighbors are returned.

Graph-based ANN builds a navigable graph over the dataset. Search starts at an entry point and greedily walks toward the query. Each step moves to the neighbor closest to the query, expanding the frontier until the best candidates are found.

DiskANN and Vamana

Microsoft Research developed DiskANN and the Vamana index to make graph-based ANN work at scale. The algorithm has three pieces:

Greedy Search maintains a candidate list and a visited set. It repeatedly expands the closest unvisited candidate, adds its graph neighbors, and keeps the best candidates bounded by a search-list size.

Robust Prune builds the graph edges. For each point, it considers possible neighbors and keeps a bounded set of useful outgoing edges. An alpha parameter controls how aggressively candidates are pruned.

Vamana Construction iterates over the dataset in random order. For each point, it runs greedy search, prunes the visited set into outgoing edges, adds backlinks, and repairs any degree violations.

The result is a sparse graph where greedy search finds high-recall neighbors quickly.

Why this matters

Vector databases like Pinecone, Weaviate, and Milvus package these ideas into production systems. They handle indexing, query routing, replication, and metadata filtering. If you are building semantic search, recommendation, or retrieval-augmented generation, you are probably using these algorithms whether you know it or not.

For the full mathematical walkthrough with pseudocode, LaTeX equations, and diagrams: How Google Search Actually Works.