linou518

Posted on Mar 23

Choosing the Foundation for Your RAG System: pgvector vs Qdrant vs Milvus (2026)

#openclaw #ai

Every team building a RAG (Retrieval-Augmented Generation) system faces the same question: which vector database should I use?

pgvector, Qdrant, and Milvus are the three dominant options today, representing three distinct philosophies: lightweight integration, high-performance specialization, and distributed scale. Choosing wrong means expensive migrations when your data grows.

This guide covers the core trade-offs to help you decide once and get it right.

Why Vector DB Selection Matters So Much

A vector database does one core job: find the most similar vectors in high-dimensional space, fast.

When a user asks a question, the LLM needs to retrieve the 5 most relevant passages from 100,000 documents. This isn't exact matching — it's Approximate Nearest Neighbor (ANN) search. Your choice of vector DB determines how fast, how accurate, and how scalable that search will be.

Three reasons the choice matters deeply:

Deep data coupling: Vector embeddings, index structures, and metadata are all stored inside — migrating means reprocessing everything
Massive performance variance: Same data volume, 10x latency difference between systems
Dramatic ops complexity difference: From "install a PostgreSQL extension" to "maintain a distributed cluster"

The Three Schools

┌────────────────────────────────────────────────────┐
│            Vector DB Philosophies                  │
├──────────────┬────────────────┬────────────────────┤
│   pgvector   │    Qdrant      │      Milvus        │
│  Lightweight │  High-perf     │   Distributed      │
│  Integration │  Specialized   │   Scale            │
├──────────────┼────────────────┼────────────────────┤
│ PostgreSQL   │ Rust-native    │ Cloud-native arch  │
│ extension    │ standalone svc │ independent cluster│
│ Small-medium │ Medium-large   │ Hyperscale         │
└──────────────┴────────────────┴────────────────────┘

pgvector: Vector Search as a PostgreSQL Column Type

CREATE EXTENSION vector;
CREATE TABLE documents (
  id bigserial PRIMARY KEY,
  content text,
  embedding vector(1536)
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

SELECT content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 5;

Strengths: Zero extra ops overhead, full SQL ecosystem, native hybrid search (vector + SQL filters in one query)
Weaknesses: Performance degrades beyond ~5M rows; no real-time multi-tenant index isolation

Qdrant: A Rust Engine Born for Vector Search

results = client.search(
    collection_name="documents",
    query_vector=[0.1, 0.2, ...],
    query_filter={"must": [{"key": "source", "match": {"value": "blog"}}]},
    limit=5
)

Strengths: Lowest latency (Rust), best filtering performance co-optimized with ANN, multi-vector support for multimodal
Weaknesses: Extra service to deploy; distributed mode requires paid plan

Milvus: Industrial-Grade Distributed Vector DB

Strengths: True horizontal scaling (separate data/query/index nodes), billion-scale support, GPU-accelerated indexing
Weaknesses: Complex architecture (etcd + MinIO + Pulsar + multiple node types), steep learning curve

Core Metrics Comparison

Dimension	pgvector	Qdrant	Milvus
Scale	<5M rows	1M–100M	100M+
P99 Latency	10-100ms	1-10ms	5-50ms
Ops Complexity	★☆☆☆☆	★★☆☆☆	★★★★☆
Filter Queries	Excellent (SQL)	Strong	Medium
Horizontal Scale	Limited	Medium (paid)	Native
ACID	Full	Eventual	Eventual

Decision Tree

How many vectors do you expect?
│
├─ < 1M
│   └─ Already using PostgreSQL? → YES: pgvector / NO: Qdrant
│
├─ 1M – 50M
│   └─ Complex filtering + joins? → YES: pgvector (tuned) or Qdrant
│                                  → NO: Qdrant (preferred)
│
└─ 50M+ / need horizontal scale → Milvus (only reasonable choice)

Production War Stories

pgvector traps: HNSW params (m, ef_construction) are immutable post-creation; always add ORDER BY ... LIMIT N to avoid full scans; consider IVFFlat for dimensions > 2000

Qdrant traps: Collection vector dimension is immutable; payload indexes aren't auto-created; tune hnsw_config.m for your recall requirements

Milvus traps: Benchmark nlist/nprobe before production; compact() must be called manually to reclaim storage; never use Lite mode in production

Conclusion: There's No Best, Only Most Appropriate

pgvector: Default choice for PostgreSQL shops, fully sufficient under 5M vectors
Qdrant: Recommended for most standalone AI applications requiring high performance
Milvus: Industrial-grade solution for genuinely large-scale distributed scenarios

Selection principle: Build first, optimize later. Don't spin up a Milvus cluster today for data you might have three years from now.

References: pgvector 0.7.0 docs | Qdrant Official Benchmarks | Milvus 2.4 docs | Knowledge Card W12D5

DEV Community