Every RAG pipeline I've reviewed this year hits the same decision point: which vector store do you actually ship? The wrong choice compounds — it shapes your architecture, your operational overhead, and how painful a future migration will be. I've run all four of these in production or near-production contexts. Here's what actually matters for the decision.
What you actually need from a vector database
Before benchmarking anything, answer these:
- Scale: how many vectors today, and in 12 months?
- Filtering: do you need metadata filters applied before the ANN search, not after?
- Hybrid search: do you need BM25 + dense vector recall blended together?
- Operational budget: how many new things can your team on-call for?
Most teams over-optimize for a scale they won't reach for 18 months and under-weight the day-one operational cost of a new infrastructure component.
ChromaDB: zero friction for prototyping
ChromaDB requires no server, no Docker, no schema definition upfront. It's embedded in Python, and you can have a working vector store in a few lines:
import chromadb
from chromadb.utils import embedding_functions
client = chromadb.PersistentClient(path="./chroma_db")
ef = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2"
)
collection = client.get_or_create_collection(
name="documents",
embedding_function=ef,
metadata={"hnsw:space": "cosine"}
)
collection.add(
documents=[
"FastAPI is great for building REST APIs",
"Go outperforms Python on CPU-bound tasks",
"Vector databases enable semantic search at scale",
],
ids=["doc1", "doc2", "doc3"],
metadatas=[
{"source": "blog", "year": 2026},
{"source": "blog", "year": 2025},
{"source": "docs", "year": 2026},
]
)
results = collection.query(
query_texts=["which backend language is fastest?"],
n_results=2,
where={"year": {"$gte": 2025}}
)
print(results["documents"])
The critical limitation: ChromaDB applies metadata filters after the ANN search. It over-fetches internally to compensate, which degrades recall correctness at scale. Its distributed mode remains underdeveloped as of mid-2026. Scale ceiling is roughly 2–5M vectors before you start noticing.
Best for: local dev, internal tools, demos, early-stage products.
Qdrant: the production default
Qdrant is written in Rust and applies payload filters before the ANN search — the technically correct behavior. This matters when you have multi-tenant data or narrow filter conditions. A filter applied post-search means you're doing extra work and getting non-deterministic recall when the filtered result set is smaller than your requested top_k.
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct,
Filter, FieldCondition, MatchValue
)
client = QdrantClient(url="http://localhost:6333")
client.recreate_collection(
collection_name="documents",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)
points = [
PointStruct(
id=1,
vector=[0.05] * 384, # replace with real embeddings
payload={"source": "blog", "year": 2026, "text": "FastAPI for REST APIs"}
),
PointStruct(
id=2,
vector=[0.12] * 384,
payload={"source": "docs", "year": 2025, "text": "HNSW index internals"}
),
]
client.upsert(collection_name="documents", points=points)
results = client.search(
collection_name="documents",
query_vector=[0.08] * 384,
query_filter=Filter(
must=[FieldCondition(key="year", match=MatchValue(value=2026))]
),
limit=5
)
for r in results:
print(r.payload["text"], round(r.score, 4))
Qdrant also supports sparse + dense hybrid search natively, which is useful when you want BM25 recall blended with semantic similarity — a common pattern for RAG over heterogeneous corpora. It handles concurrent writes well, exposes both REST and gRPC, and its Python SDK is actively maintained. The managed cloud tier is straightforward to size.
Best for: production RAG pipelines, multi-tenant SaaS, datasets above 5M vectors.
Weaviate: feature-rich, complex to operate
Weaviate offers the largest feature set in this list: GraphQL querying, multi-tenancy, built-in hybrid search, modules for text and images, and a schema-based data model. If you genuinely need multi-modal search or a GraphQL interface over your vector data, it's the only option here that delivers it cleanly.
The operational cost is real. Weaviate ships frequent releases and requires careful memory tuning on self-hosted deployments. Its schema-first approach adds friction during the exploration phase when your embedding model is still changing. The managed tier (Weaviate Cloud) is generous at small scale but cost climbs fast past 1M objects.
It's also the most complex to reason about internally: its ANN implementation is HNSW, and it layers BM25 on top for hybrid search. When things behave unexpectedly, the debugging surface is wide.
Best for: product search with image embeddings, teams that need GraphQL, complex multi-modal use cases.
pgvector: the underrated option for Postgres teams
If your application already runs on Postgres, pgvector eliminates an entire infrastructure dependency. Version 0.5 added HNSW index support, which closed most of the performance gap with dedicated solutions at moderate scale.
import psycopg2
import numpy as np
conn = psycopg2.connect("dbname=mydb user=postgres host=localhost")
cur = conn.cursor()
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute("""
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
content TEXT,
source TEXT,
year INT,
embedding vector(384)
)
""")
cur.execute(
"CREATE INDEX IF NOT EXISTS idx_doc_embedding "
"ON documents USING hnsw (embedding vector_cosine_ops)"
)
conn.commit()
embedding = np.random.rand(384).tolist()
cur.execute(
"INSERT INTO documents (content, source, year, embedding) VALUES (%s, %s, %s, %s)",
("pgvector HNSW makes semantic search viable in Postgres", "blog", 2026, embedding)
)
conn.commit()
query_vec = np.random.rand(384).tolist()
cur.execute("""
SELECT content, 1 - (embedding <=> %s::vector) AS similarity
FROM documents
WHERE year >= 2025
ORDER BY embedding <=> %s::vector
LIMIT 5
""", (query_vec, query_vec))
for row in cur.fetchall():
print(f"{row[0]} -- similarity: {round(row[1], 4)}")
cur.close()
conn.close()
Your existing Postgres tooling — backups, monitoring, migrations, access control — carries over. No new service to operate, no new runbook to write. The tradeoffs: no native hybrid search yet (you can approximate with tsvector + cosine distance, but it's glue code), HNSW index builds are slower than Qdrant's, and at 10M+ vectors with high QPS, dedicated hardware starts to matter.
Best for: teams already on Postgres, datasets under 5M vectors, early-to-mid-stage RAG where operational simplicity matters.
Comparison at a glance
| ChromaDB | Qdrant | Weaviate | pgvector | |
|---|---|---|---|---|
| Setup | Embedded | Docker / Cloud | Docker / Cloud | PG Extension |
| Pre-filter ANN | No | Yes | Yes | Partial |
| Hybrid search | No | Yes | Yes | No |
| Scale ceiling | ~5M | 100M+ | 50M+ | ~10M |
| Operational cost | Very low | Low | High | Low (on PG) |
| Managed option | No | Yes | Yes | Via PG providers |
The takeaway
Default path: start with ChromaDB to ship fast, migrate to Qdrant when you need pre-filter correctness or hit scale, use pgvector if you're already on Postgres and your dataset stays under a few million vectors. Reach for Weaviate only when you specifically need its feature set.
The biggest mistake I see is teams optimizing for a scale they won't reach for 18 months while ignoring the operational burden of a new database they'll feel on day one. Pick the simplest option that fits your actual current requirements, and design a migration path before you need it.
For regulated deployments — healthcare, finance, government — verify encryption-at-rest guarantees and data residency options for each managed offering before committing. We track the right questions to ask in our security evaluation checklists.
I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. We publish free security hardening checklists — PDF and Excel.
Top comments (0)