DEV Community

Cover image for Your Vector Database Decision Is Simpler Than You Think
Rene Zander
Rene Zander

Posted on

Your Vector Database Decision Is Simpler Than You Think

Every week someone asks which vector database they should use. The answer is almost always "it depends on three things," and none of them are throughput benchmarks.

I run semantic search in production on a single VPS. Over a thousand items indexed, embeddings generated on the same machine, queries return in under a second. But that setup only works because of the constraints I'm operating in. Change the constraints and the answer changes completely.

Here's how I think about it.

The overchoice problem

There are dozens of vector databases now. Every one of them publishes benchmarks showing millions of vectors queried in milliseconds. That's great if you're building a search engine for the entire internet. Most of us aren't.

The benchmarks test throughput at scale. What they don't test is: can this thing run on the same box as your application without eating all the memory? Can you set it up in ten minutes? Does it need a cluster?

Those are the questions that actually matter when you're picking one.

Scenario 1: Local device, small dataset, ephemeral

You have a CLI tool or a local application. Your data is a few hundred markdown files or JSON documents. The user runs it on their laptop.

You don't need a database. Load the vectors into memory on startup, compute cosine similarity, done. A flat array of float32 embeddings and a brute-force search will outperform any database at this scale because there's zero overhead. No process to manage, no port to configure, no persistence to worry about.

Pre-compute your embeddings at build time or on first run, store them alongside the source files. When the data changes, regenerate. At a few hundred items this takes seconds.

The mistake people make here is reaching for a database because it feels like the "proper" way. It's not. It's unnecessary complexity for a problem that fits in a single array.

Scenario 2: Small VPS, thousands of items, needs persistence

Now things change. Your data lives behind an API. It updates throughout the day. You need search results to reflect changes within minutes, not hours. The whole thing runs on a VPS with maybe 2GB of RAM, shared with other services.

This is where a lightweight vector database process makes sense. Something that runs as a single binary, stores vectors on disk, serves queries over a local HTTP API. You don't need clustering or replication. You need something that starts fast, uses a few hundred MB of RAM, and doesn't crash when you restart other services on the same box.

The key decisions here: embedding model runs locally or via API? If your VPS has enough RAM, running a small embedding model locally saves you per-request API costs and latency. If RAM is tight, use an embedding API and only store the resulting vectors locally.

Change detection matters at this scale. You don't want to re-embed everything on every sync. Hash the source content, compare to what's stored, only embed what changed. This keeps your sync jobs fast and your API costs predictable.

Scenario 3: Multi-service, millions of vectors, high availability

This is where the benchmarks actually apply. Multiple services querying the same vector index. Data measured in millions of items. Uptime requirements that mean you can't tolerate a single-process restart.

At this scale you need a managed service or a self-hosted cluster. Replication, sharding, automatic failover. The operational overhead is real but justified because downtime now costs money.

Most teams jump here first because it looks professional. But if your dataset fits in 2GB of RAM and you have one service querying it, you're paying for complexity you don't need.

The three questions

Before looking at any product page, answer these:

Where does the data live? If it's local files that rarely change, stay in-memory. If it's behind an API that updates constantly, you need a persistent store.

How much RAM can you spare? This determines whether you run embeddings locally or call an API, and whether your database runs in-process or as a separate service.

Do you need persistence or is ephemeral fine? If you can regenerate everything from source in seconds, skip the database. If regeneration takes minutes or hours, persist.

These three questions eliminate 80% of the options before you read a single benchmark.

Start from the environment

The pattern I keep seeing is people evaluating vector databases by features and benchmarks, then trying to fit the winner into their deployment. It works the other way around. Start from your environment, your constraints, your data volume. The right answer usually becomes obvious.

The comparison articles won't tell you this because they can't. They don't know if you're running on a laptop, a $5 VPS, or a Kubernetes cluster. You do.

If you're building something with semantic search and want to think through which approach fits your setup, I'm always up for a quick conversation: cal.eu/reneza.

Top comments (0)