Ozioma Ochin

Posted on Mar 28 • Originally published at ozi.hashnode.dev

Why Most Developers Reach for a Vector Database Too Soon.

#backend #webdev #vectordatabase #semanticsearch

Most semantic search tutorials start the same way: add a vector database.

The feature request sounded simple: type question, get the right internal doc back.

A few hundred documents. Support notes and wiki pages.

Nothing exotic. The kind of thing that should take a week, maybe less.

They did what most of us would do today.

They watched a couple of LangChain tutorials, skimmed the OpenAI docs, and followed the same architecture every example seemed to use.

Documents were chunked, embeddings generated, and everything went into a hosted vector database.

An ingestion pipeline kept the index in sync.

Queries hit the vector store first, then the app database.

It looked like the modern, correct way to build search.

Three weeks later, the feature worked — technically.

But updating a single document meant re-running the embedding pipeline.

The vector index and the app database could drift out of sync silently.

API keys just to run the thing locally.

Every deployment waited on background indexing to finish before results were reliable.

The system was fragile in ways that would keep compounding.

A Postgres full-text search would have solved the original problem in an afternoon.

The vector database wasn't wrong. It was just answering a question nobody had asked yet.

This article is about how to ask the right question before you start building — and what the answer looks like in practice.

What a Vector Database Is Actually For

Most developers working with embeddings already know what a vector database does.

Fewer stop to ask whether that specific capability is what their problem actually requires.

Before arguing when not to reach for one, it's worth being precise about what the tool is actually built for.

When you generate embeddings for text, images, or other data, you end up with arrays of floating-point numbers.

Finding the most similar item means comparing one vector against many others.

For small datasets, you can do this with a simple scan.

As the number of vectors grows, brute-force comparison becomes too slow, and you need specialized indexes designed for approximate nearest neighbor search.

That’s the problem vector databases are optimized to solve.

Under the hood, most of them rely on approximate nearest neighbour algorithms.

HNSW for graph-based search.

IVF for cluster-based partitioning.

They trade a small amount of recall accuracy for dramatically faster queries.

For semantic search, that trade-off is almost always acceptable — you don't need the single most similar document, you need several good ones, fast.

pgvector exposes this same choice directly in SQL — the query is identical with or without the index:

-- Without IVFFlat - PostgreSQL scans every row

SELECT content,
       embedding <=> query_vector AS distance
FROM documents
ORDER BY distance
LIMIT 5;

-- with IVFFlat — PostgreSQL searches only relevant clusters
-- same query, dramatically different performance at scale
SELECT content,
       embedding <=> query_vector AS distance
FROM documents
ORDER BY distance
LIMIT 5;

The query looks identical — the difference is entirely in the index. This is the performance decision pgvector hands back to you.

Everything else people associate with vector databases — metadata filtering, hybrid search, multi-tenant indexes, reranking — sits on top of that core capability.

Here's where the confusion starts.

The term "vector database" bundles several distinct concerns — storing embeddings, searching them, filtering results, and running the infrastructure — into what looks like a single decision.

The tooling reinforces it.

When every tutorial wires all four together in the same five lines of code, it stops looking like a choice and starts looking like a requirement.

As soon as a project involves embeddings, it can seem like a dedicated vector database is the only correct design.

It isn’t.

Embeddings are just data.

They can live in Postgres, SQLite, or even memory.

A vector database becomes the right tool when approximate nearest neighbor search is the bottleneck — not when embeddings first appear in the architecture.

Until that point, it’s often extra complexity you don’t need.

Why Developers Reach for It Too Early

1. Tutorial monoculture

Most examples of semantic search, RAG, or LLM-powered features follow the same pattern: chunk documents, generate embeddings, store them in a vector database, query with similarity search.

LangChain demos do it. LlamaIndex demos do it. OpenAI examples do it.

2. The scalability trap

Once embeddings enter the design, it’s easy to assume the system will eventually need fast similarity search at scale, so the vector database gets added early to avoid rewriting things later.

This is the same instinct that leads teams to introduce Kafka for a service that sends ten emails a day. The future problem might be real.

But solving it before it exists adds complexity immediately, with no corresponding benefit.

3. Tooling and marketing

Modern vector databases have excellent documentation, polished SDKs, and tutorials that get you from zero to similarity search in under an hour.

That ease of setup is genuinely impressive — and it's also exactly what makes the tool feel mandatory before you've decided whether you need it.

Great onboarding has a way of skipping the step where you ask whether you should be onboarding at all.

Developers don't reach for vector databases too early because they don't understand the technology.

They do it because the ecosystem makes it look like the obvious first step.

At some point, the vector database became the new Redis of the AI stack - added by default, before anyone confirmed it was actually needed.

The result isn't broken systems. It's systems that are harder to run, slower to change, and more expensive to maintain than the problem ever required. The complexity arrives on day one. The scale that would justify it may never come.

The Simpler Stack You’re Ignoring

Two tools get overlooked almost every time: pgvector, which runs inside the Postgres instance you already have, and plain keyword search, which still solves more problems than people want to admit.

1. pgvector — When Your Database Is Already Postgres

If your application already runs on PostgreSQL — and most do — adding pgvector gives you similarity search without introducing a new service.

No new deployment. No additional failure mode.

pgvector adds a VECTOR column type and similarity operators directly to PostgreSQL.

Embeddings live alongside the rest of your data, queryable with SQL, inside the same transactional system your application already depends on.

No new monitoring, no separate backups, no second system to explain to the next engineer on the team.

The setup starts with enabling the extension and creating the table:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
    id                   BIGSERIAL PRIMARY KEY,
    title                TEXT NOT NULL,
    content              TEXT NOT NULL,
    metadata             JSONB,
    embedding            VECTOR(1536),
    status               TEXT NOT NULL DEFAULT 'READY',
    embedding_error      TEXT,
    embedding_updated_at TIMESTAMPTZ,
    created_at           TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at           TIMESTAMPTZ NOT NULL DEFAULT now()
);

The status column is worth noting.

Because embedding is an external API call that can fail, documents move through a lifecycle — PENDING when first saved, READY once the embedding succeeds, FAILED if the API returns an error.

This means a failed embedding never silently corrupts search results. The status is always visible in the database.

For similarity search to scale beyond a few thousand documents, pgvector needs an index. The IVFFlat index is what makes this production-ready:

CREATE INDEX documents_embedding_ivfflat_idx 
    ON documents USING ivfflat (embedding vector_cosine_ops) 
    WITH (lists = 100);

This is a single SQL query that combines vector similarity search with lifecycle filtering and metadata filtering simultaneously.

A dedicated vector database handles each of those concerns separately — often requiring application-level joins or multiple round trips to combine them.

Here, everything runs inside one query, in one database, with full ACID guarantees.

The <=> operator is pgvector's cosine distance operator. It returns a value between 0 and 2 — lower means more similar. Results are ordered ascending so the closest matches come first.

For most products, this goes further than people expect. pgvector handles millions of vectors without meaningful performance degradation for typical query patterns.

If you're building an internal tool, a document search API, or a RAG feature for a product that isn't at serious scale yet, you're almost certainly in that range.

There are real limits. pgvector won't give you distributed indexing, automatic sharding, or sub-10ms latency under very high query volume.

If you're storing tens of millions of vectors and serving high-QPS queries, a dedicated vector database will outperform it.

But by the time you reach that point, you'll know exactly why you're making the switch — because you'll have measured the problem, not imagined it.

The full implementation — including all three Flyway migrations, the IVFFlat index configuration, lifecycle tracking, and metadata filtering — is available on GitHub.

You don't need a new database to add semantic search. You need pgvector and a migration file.

2. BM25 and Keyword Search — The Tool Nobody Wants to Admit Still Works

Before you generate a single embedding, it's worth asking whether your users actually need semantic search — or whether they just need search that works.

A lot of features labeled “AI search” are really just keyword lookup with better marketing.

If your users know the words they’re looking for, traditional full-text search is often faster, simpler, and more predictable than embeddings.

BM25-based search — the ranking algorithm used by most full-text engines — is extremely good at matching short, precise queries.

-- standard PostgreSQL full-text search — no new infrastructure needed
SELECT title, content,
       ts_rank(to_tsvector('english', content),
               plainto_tsquery('english', 'reset password')) AS rank
FROM documents
WHERE to_tsvector('english', content)
      @@ plainto_tsquery('english', 'reset password')
ORDER BY rank DESC
LIMIT 5;

This runs inside the same PostgreSQL instance as your pgvector queries — no new service, no new failure mode.

Searches like “reset password”, “invoice template”, or a specific error message often perform better with keyword scoring than with vector similarity.

In domains with strict terminology — legal references, product codes, medical terms — exact matches matter more than semantic closeness.

Embeddings shine when meaning matters more than wording. If users are asking “show me something like this” or “what document explains this idea”, vector search makes sense.

If they’re typing the name of the thing they want, it usually doesn’t.

You also don’t have to choose one or the other. Postgres supports full-text search, pgvector supports similarity search, and combining the two often gives better results than either alone.

A hybrid query looks like this — no new infrastructure, no new service:

SELECT id, title, content, 
        embedding <=> ?::vector AS cosine_distance, 
        ts_rank( 
            to_tsvector('english', content), 
            plainto_tsquery('english', ?) 
        ) AS text_rank 
FROM documents 
WHERE status = 'READY' 
    AND embedding IS NOT NULL 
ORDER BY 
    (embedding <=> ?::vector) * 0.7 
    - ts_rank( 
        to_tsvector('english', content), 
        plainto_tsquery('english', ?) 
) * 0.3 
ASC 
LIMIT 10;

A simple hybrid query can rank by keyword match first and semantic distance second, without adding any new infrastructure.

Before adding a vector database, answer a simpler question first: can keyword search solve 80% of this?

If the answer is yes, start there. You can always add embeddings later. You can't easily remove infrastructure you didn't need.

When You Actually Need a Vector Database

Vector databases aren't the villain here. They're the right tool when similarity search becomes a real, measured performance problem — not a projected one. The question is how to recognize that moment before you've already over-built.

These are the thresholds where teams consistently start feeling the limits of a general-purpose setup:

Signal	Typical threshold
Vector count	Millions of embeddings
Query latency	Sub-50 ms p99
Filtering complexity	Multi-tenant filters
Query volume	High QPS
Infrastructure maturity	Dedicated team
Use case	Recommendation, RAG, personalization

The numbers aren’t rules. They’re patterns. The real signal is when the simple approach stops being simple.

If your search feature is core to the product and needs predictable latency under load, the tradeoffs of a dedicated vector database start to make sense.

You know you need a vector database when brute-force similarity becomes your actual bottleneck — not your imagined one.

A Simple Decision Flowchart

Most projects don't need a new datastore. They need a clear decision process.

If you're starting a new feature and asking whether a vector database belongs in the design, the flowchart below maps the decision from 'I need search' to the right tool for your current scale.

A simple way to think about the decision:

The flowchart won't cover every edge case — no decision tool does.

But if your design jumps straight to a managed vector database before working through these questions, you're probably solving a scaling problem you don't have yet.

The cost of that mistake shows up slowly, in complexity that compounds before the scale ever arrives.

Architecture Should Follow the Problem

The best architecture isn't the most modern one.

It's the one that matches the problem you actually have, at the scale you're actually at — maintained by the team you actually have, not the one you might hire later.

Vector databases are powerful tools, but they come with real operational cost — another service to run, another datastore to keep in sync, another place where performance and correctness can drift apart.

That cost only makes sense when the problem demands it. Before that point, simpler designs are usually easier to build, easier to reason about, and easier to change when requirements shift.

Starting with pgvector or full-text search doesn’t lock you in. If you outgrow it, the path to a dedicated vector database is well understood. The reverse is harder.

Removing infrastructure you didn’t need is almost always more work than adding it later.

The full pgvector implementation, including schemas, index configuration, and the search query shown above, is available on GitHub .

Most systems don’t fail because they chose the wrong tool. They fail because they chose the right tool too early.

The real skill isn’t knowing how to use a vector database. It’s knowing when not to.

DEV Community