DEV Community: Dilip V P

How Vector Databases Search a Million Vectors Without Checking a Million

Dilip V P — Sat, 18 Jul 2026 18:53:10 +0000

Take the word "king." Your database does not store the word. It stores a vector: a list of 768 numbers that place king at a point in space, where words used in similar ways sit nearby.

That one move changes the whole problem. "Find something similar" becomes "find the nearest point." And finding the nearest point in a space of hundreds of dimensions turns out to be the search your normal database index genuinely cannot do.

Meaning becomes a place

An embedding model reads enormous amounts of text and learns to place each word (or sentence, or image) at a point, so that things used in similar contexts land near each other. Closeness is usually measured with cosine similarity: how aligned two vectors are, ignoring their length.

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("all-mpnet-base-v2")  # 768 dimensions
vecs = model.encode(["king", "queen", "bicycle"], normalize_embeddings=True)

print(util.cos_sim(vecs[0], vecs[1]))  # king vs queen   -> ~0.63
print(util.cos_sim(vecs[0], vecs[2]))  # king vs bicycle -> ~0.16

king lands near queen and far from bicycle. "Similar meaning" is now just "small distance."

Why brute force scores every vector

To find the closest vector to a query, the simple approach compares the query to every stored vector and keeps the best:

best, best_score = None, -1
for v in all_vectors:        # N vectors
    s = dot(query, v)        # normalized vectors, so dot product = cosine
    if s > best_score:
        best_score, best = s, v

That is O(N times d). With 768 dimensions, a million vectors is roughly a million dot products for a single search. On my machine a simple CPU scan of 1,000,000 vectors took about 38 ms per search, which extrapolates to a few seconds at 100,000,000. A service doing thousands of searches a second cannot live there.

Why your B-tree index can't save you

The instinct from the earlier posts in this series is "add an index." But a B-tree gives you exactly one sorted order. Sort every vector by dimension 0 and you are blind to the other 767. Two vectors can be neighbors in the full space and sit thousands of positions apart in that single sort.

In one run, the true nearest word to "coffee" was "starbucks" (similarity 0.69). Sorted by a single dimension, starbucks sat 3,333 positions away from coffee. The closest thing inside a small window of that sort was "meals" at 0.44: wrong, and weaker.

A composite index does not fix it either. Multiple indexed columns still produce one lexicographic order, not an ordering by distance. Sorted order answers ranges. Similarity is not a range.

The small-world idea

Here is the turn. Think of six degrees of separation: most of your friends are local, but a few know someone far away, and those few long-range links let a message cross the world in about six hops.

Do the same to vectors. Give each one a short list of neighbors, including a few distant ones. Now search is a walk: start somewhere, and keep hopping to whichever neighbor is closer to your query. On a flat graph this already works, but it can crawl, taking many small steps or getting stuck one hop short of the best answer.

HNSW: jump, then refine

HNSW (Hierarchical Navigable Small World, Malkov and Yashunin, 2016) fixes the crawl by adding layers, like a skip list for a graph.

Top layer: a sparse "airport" map. A few vectors, long jumps across the whole space.
Middle layers: "highways."
Bottom layer: every vector, the "streets," for the final precise step.

You enter at the top, take big jumps to get roughly there, then drop down a layer and refine, again and again, until the streets put you on the exact neighbor. A single search touches a tiny fraction of the data.

import hnswlib

index = hnswlib.Index(space="cosine", dim=768)
index.init_index(max_elements=len(vectors), M=16, ef_construction=200)
index.add_items(vectors, ids)

index.set_ef(50)   # the accuracy vs speed dial
labels, distances = index.knn_query(query, k=10)

The honest trade

HNSW is approximate. It can miss, and misses are silent, so you tune it against a recall target. ef_search is the dial: higher looks at more candidates, so recall goes up and speed goes down.

On my run, on the same 20,000 real embeddings:

brute force, exact: 1.23 ms per search
HNSW, ef_search = 50: 0.62 ms per search, recall@10 = 99.9%
HNSW, ef_search = 200: 2.06 ms per search, recall@10 = 100%

Notice the last line: at 20,000 vectors, cranking ef past a point makes HNSW slower than brute force. Approximate search wins because it scales differently: an exact scan is linear in N, HNSW is roughly logarithmic, so the gap widens as your dataset grows. At a million or a hundred million, brute force is hopeless and HNSW is still fast.

Who runs this

HNSW is one of the most common indexes under semantic search and the retrieval step of AI chatbots:

pgvector (inside Postgres you already run)
Weaviate (dedicated engine, self-hosted or managed)
Pinecone (managed)

One rule of thumb: start with pgvector in the database you already operate, and graduate to a dedicated engine when RAM, traffic, filtering, or ops actually force you. It is not the only vector index (IVF, PQ, DiskANN, hybrid search all exist), and not every AI search uses it.

Watch it happen

The full episode builds the meaning-map, shows why the B-tree fails, and descends through the HNSW layers:

Numbers are from a demo on my machine, one model everywhere (all-mpnet-base-v2, 768 dims, 20,000 real words). The one-by-one counter is an animation; the per-check cost is measured and the totals are arithmetic. The million and hundred-million scan times are a linear estimate from a simple CPU run, not a universal speed. The HNSW vs brute-force numbers are measured on the 20,000 vectors with hnswlib. Also worth knowing: "similar" here means "used in similar contexts," not synonyms, so opposites like good and bad can score high.

Which vector database are you using, and what pushed you to it?

The N+1 Query Problem: How One Page Fires 2,101 Queries (and How to Get Back to 3)

Dilip V P — Sat, 11 Jul 2026 11:08:00 +0000

I put a query counter on a single page and loaded it. The page rendered fine. The counter said 101.

That's the N+1 problem. And the reason it survives code review, testing, and staging is the twist most explanations bury: every one of those queries is fast.

Where the 101 comes from

One innocent loop (pseudo-code, parameters skipped for brevity; never build SQL by string concat in real code):

posts = db.query("SELECT * FROM posts LIMIT 100")
for post in posts:
    author = db.query("SELECT ... FROM users WHERE id = ?", post.author_id)
    render(post, author)

One query for the list, plus N queries for the rows. 1 + 100 = 101. Turn on your query log and you can watch it happen: the same SELECT, over and over, one id at a time.

Why "every query is fast" doesn't save you

Each of those queries is indexed. Each takes about a millisecond. The problem is that every query is a round trip to the database, and a round trip costs a few milliseconds on a real network. Sometimes more.

The cost of an N+1 is not query speed. It is count times round trip. A hundred round trips at a few milliseconds each, and one page waits half a second.

It nests

Each comment also loads its author. Now the loop hides another loop:

100 posts x 20 comments each
1 (posts) + 100 (authors) + 2,000 (comment authors) = 2,101 queries

That is 2,101 queries for one page. Not an estimate, just the arithmetic of a nested loop. N+1 fires one query for every node in your object tree. Comments, authors, tags: add a relation and it multiplies again.

The fix: one query per level

Stop asking one row at a time. Collect the ids at each level, fire one batched query per level, and stitch the results in code:

SELECT * FROM posts LIMIT 100
SELECT ... FROM comments WHERE post_id IN (...)
SELECT ... FROM users WHERE id IN (...)

2,101 queries becomes 3. The count now follows the DEPTH of the tree, not the size of it. A million rows? Still 3 queries.

In most frameworks the batch fix is one line, the eager-load option:

JPA/Hibernate: JOIN FETCH (to-one), @BatchSize or @EntityGraph (to-many)
Rails: includes / preload
Django: select_related (to-one) / prefetch_related (to-many)
GraphQL: DataLoader
EF Core: Include + AsSplitQuery

The JOIN trap

"Why not just JOIN everything?" For to-one relations, yes: a post and its one author JOIN into one clean row.

But JOIN a post to its 50 comments and the post comes back 50 times, once per comment row. Worse, pagination breaks: LIMIT 100 counts joined rows, not posts, so you ask for 100 posts and get 2.

The rule: to-one, JOIN. To-many, batch with IN.

Why nobody notices until production

The ORM hides the queries. You write post.author and a SELECT fires behind your back. In development you have 10 rows and the page is instant. In production you have 10,000, one page fires thousands of queries, each query holds a database connection while it runs, the pool empties, every request waits, and the page you tested a hundred times goes down.

A lot of "it worked yesterday" outages are exactly this.

Catch it in 60 seconds

Turn on query logging in development, load ONE page, read the log:

Hibernate: spring.jpa.show-sql=true
Rails: config.log_level = :debug
Django: enable logging for django.db.backends
Prisma: log ["query"]

The same SELECT repeating with different ids means that page has an N+1. If the query count grows when your data grows, same disease.

When it's fine

Not every N+1 is worth fixing. Three items on a dashboard? Leave it. Don't over-engineer. Fix it when N grows with your data. The senior move isn't fixing every one, it's knowing which ones matter.

Fast pages aren't about fast queries. They're about fewer of them.

Watch it happen

The full episode shows the counter climbing to 2,101 and collapsing to 3:

Numbers are from the demo in the video (generic posts/comments/authors schema), with the round trip simulated to model a remote database. Same-datacenter latency is sub-millisecond; cross-network can be tens of ms. The lesson is count times YOUR latency.

What's the worst N+1 you've found in production?

Why Your Database Index Gets Ignored (and How to Design One That Isn't)

Dilip V P — Sat, 04 Jul 2026 10:33:38 +0000

TL;DR: An index can exist and still do nothing for your query. A multi-column index only serves queries that use its columns from the left, in the index's order. Fix it by putting the column you filter on first. Go further by putting every column the query needs inside the index (a covering index) so the database never touches the table. But every index taxes every write, so design them, don't collect them.

The setup

Last time I showed what happens with no index: the database reads every row. This is the sneakier version.

You added the index. EXPLAIN still says the table is being scanned. The index isn't broken, and the database isn't being dumb. The index just cannot serve that query.

CREATE INDEX idx_name ON users(last_name, first_name);

EXPLAIN SELECT * FROM users WHERE first_name = 'Martha';
-- SCAN users        <- ignored

The left-prefix rule

A multi-column index keeps rows sorted by its first column, then the next: exactly like a phone book, last name then first name.

Search by last name and you jump straight to the page. Search by first name alone and the sorting cannot help you: the Marthas are scattered across every page. There is no way to jump, so the database reads the whole thing.

That is the left-prefix rule: an index serves a query only when the filter starts from the index's first column.

WHERE last_name = ? -> uses the index
WHERE last_name = ? AND first_name = ? -> uses the index
WHERE first_name = ? -> full scan One thing people get wrong: the order of conditions in your SQL means nothing. WHERE a = ? AND b = ? and WHERE b = ? AND a = ? produce the identical plan. Only the column order inside the index counts.

The fix, measured

Rebuild the index so the column you filter on comes first:

CREATE INDEX idx_first_last ON users(first_name, last_name);

On my test table (SQLite, in-memory, 1M rows), that same query flipped from SCAN users at 26.4 ms to SEARCH users USING INDEX at 0.02 ms. Roughly 1,300x. Your absolute numbers will differ; the plan flip is the point.

Covering indexes: never touch the table

A normal index holds just two things: the columns it is sorted on, and a pointer to the row. So after it finds your match, it takes a second hop to the table to grab the other columns you asked for.

But if every column your query needs is already in the index, that second hop disappears. The database answers from the index alone:

-- index: (first_name, last_name, city)
SELECT last_name, city FROM users WHERE first_name = 'Martha';

The plan names it differently per database:

Database	Covering hit looks like
SQLite	`USING COVERING INDEX`
Postgres	`Index Only Scan`
MySQL	`Extra: Using index`

In MySQL/InnoDB the saving is doubled: secondary indexes store the primary key as the row pointer, so the "second hop" is itself another index lookup. (Postgres uses heap tables, so the mechanics differ; Index Only Scan is the thing to look for.)

What every index costs your writes

An index is a B-tree kept in perfect sorted order. That order is what makes reads fast, and it is exactly what writes have to pay for.

Every insert walks down the tree and slots the new value into its exact place. When a node fills up, it splits in two and promotes a key to its parent. That split can ripple upward and grow the whole tree a level. One small insert can rewrite several nodes, and every index on the table is another tree the database keeps balanced on every single write.

When NOT to add an index

Low-selectivity columns. An order status or a yes/no flag matches half the table. Jumping saves nothing, so the planner scans anyway. The index just sits there taxing writes.
Write-heavy, read-light tables. If you write far more often than you read, the tax outweighs the benefit.
Redundant prefixes. An index on (last_name, first_name) already serves last_name queries. Adding another index on last_name alone is write cost for nothing. Index the columns you actually filter, join, and sort on: the ones that narrow to a few rows.

The principle

An index isn't a box you tick. It is a structure you design: the right column order so it gets used, the right columns inside so it covers, and only where reads outweigh writes.

Watch it visually

I made a short, visual breakdown of all of this: the phone book, the real EXPLAIN flip, the covering hit, and the B-tree split animation:

It is episode two of Inside the Database, part of The Leap, a series explaining the systems we build on from first principles. Next up: how loading one page can quietly fire a hundred queries, and nobody notices.

What is the most confusing plan EXPLAIN has ever shown you?

How Database Indexes Actually Work (and When They Backfire)

Dilip V P — Mon, 29 Jun 2026 06:58:15 +0000

TL;DR: Without an index, your database finds a row by reading every row (a full table scan). An index is a sorted structure that lets it jump straight to the row instead. But indexes are a trade, not free speed: they only help selective queries, and they slow down every write. Use EXPLAIN to see what your database is actually doing.

The setup

Your query was instant in development. In production, it crawls. Same code. The only thing that changed is the amount of data.

Nine times out of ten, this is why: without an index, the database has only one way to find your row, which is to read every row, one at a time, until it matches. That is a full table scan.

On 10,000 rows you don't notice. On 10,000,000, it is painful.

Why an index exists (first principles)

An index exists to avoid that work. Scanning the whole table doesn't scale: a query that filters on one column shouldn't have to read every row to find a handful.

So the database keeps a separate, sorted directory of one column's values, stored alongside the table, like the index at the back of a textbook. Instead of flipping through every page, you look up the term and jump straight to the page.

CREATE INDEX idx_users_email ON users(email);

Why it's fast

Because the directory is sorted, the database doesn't read it top to bottom either. It navigates straight toward the value: narrow the range, discard the half that can't contain it, repeat. That is the same halving idea as binary search, and the structure that makes it work on disk is a B-tree (the disk-friendly generalization, not literally a binary search over rows).

The payoff: finding one row among a million takes on the order of ~20 steps, not a million.

The part most engineers miss: it's a trade

Indexes are not free performance. They are a trade-off:

They only help when your query is selective, when it returns a small fraction of the table. WHERE id = ? (one row) flies. WHERE active = true (half the table) can be slower with the index than a plain scan, because there is no shortcut when you are returning most of the rows. A plan can literally say "using index" and still be slow; what matters is how many rows it touches.
Every index has a write cost. Inserts, updates, and deletes all have to keep every index current. The more indexes you have, the slower your writes.
They cost storage.

So the rule isn't "add indexes everywhere." It is: index the columns your queries actually filter, join, and sort on, and only where it is selective enough to help.

Stop guessing: read the plan

You never have to guess at any of this. Put EXPLAIN in front of your query and the database shows you its plan before it runs anything.

EXPLAIN SELECT * FROM users WHERE email = ?;

What you are looking for is "reads the whole table" turning into "uses the index":

Database	Slow (full scan)	Fast (uses index)
SQLite	`SCAN users`	`SEARCH users USING INDEX`
Postgres	`Seq Scan`	`Index Scan` / `Index Only Scan`
MySQL	`type: ALL`	`type: ref` / `range`

(In Postgres, EXPLAIN ANALYZE also shows the real row counts versus the planner's estimates, which is where a lot of "why didn't it use my index?" mysteries get solved.)

Get in the habit of reading the plan, and you will catch the slow query before it pegs your database and takes the app down.

Watch it visually

I made a short, visual breakdown of all of this, from the full table scan to the index, to why it is fast, to when it backfires, to reading EXPLAIN:

It is the first episode of The Leap, a series explaining the systems we build on, from databases and networking to memory and distributed systems, from first principles. Next up: the dark side of indexes, and when adding one is the wrong move.

What's the nastiest slow query you have had to debug in production?