DEV Community

linou518
linou518

Posted on

Choosing the Foundation for Your RAG System: pgvector vs Qdrant vs Milvus (2026)

Every team building a RAG (Retrieval-Augmented Generation) system faces the same question: which vector database should I use?

pgvector, Qdrant, and Milvus are the three dominant options today, representing three distinct philosophies: lightweight integration, high-performance specialization, and distributed scale. Choosing wrong means expensive migrations when your data grows.

This guide covers the core trade-offs to help you decide once and get it right.


Why Vector DB Selection Matters So Much

A vector database does one core job: find the most similar vectors in high-dimensional space, fast.

When a user asks a question, the LLM needs to retrieve the 5 most relevant passages from 100,000 documents. This isn't exact matching — it's Approximate Nearest Neighbor (ANN) search. Your choice of vector DB determines how fast, how accurate, and how scalable that search will be.

Three reasons the choice matters deeply:

  1. Deep data coupling: Vector embeddings, index structures, and metadata are all stored inside — migrating means reprocessing everything
  2. Massive performance variance: Same data volume, 10x latency difference between systems
  3. Dramatic ops complexity difference: From "install a PostgreSQL extension" to "maintain a distributed cluster"

The Three Schools

┌────────────────────────────────────────────────────┐
│            Vector DB Philosophies                  │
├──────────────┬────────────────┬────────────────────┤
│   pgvector   │    Qdrant      │      Milvus        │
│  Lightweight │  High-perf     │   Distributed      │
│  Integration │  Specialized   │   Scale            │
├──────────────┼────────────────┼────────────────────┤
│ PostgreSQL   │ Rust-native    │ Cloud-native arch  │
│ extension    │ standalone svc │ independent cluster│
│ Small-medium │ Medium-large   │ Hyperscale         │
└──────────────┴────────────────┴────────────────────┘
Enter fullscreen mode Exit fullscreen mode

pgvector: Vector Search as a PostgreSQL Column Type

CREATE EXTENSION vector;
CREATE TABLE documents (
  id bigserial PRIMARY KEY,
  content text,
  embedding vector(1536)
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

SELECT content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Strengths: Zero extra ops overhead, full SQL ecosystem, native hybrid search (vector + SQL filters in one query)
Weaknesses: Performance degrades beyond ~5M rows; no real-time multi-tenant index isolation

Qdrant: A Rust Engine Born for Vector Search

results = client.search(
    collection_name="documents",
    query_vector=[0.1, 0.2, ...],
    query_filter={"must": [{"key": "source", "match": {"value": "blog"}}]},
    limit=5
)
Enter fullscreen mode Exit fullscreen mode

Strengths: Lowest latency (Rust), best filtering performance co-optimized with ANN, multi-vector support for multimodal
Weaknesses: Extra service to deploy; distributed mode requires paid plan

Milvus: Industrial-Grade Distributed Vector DB

Strengths: True horizontal scaling (separate data/query/index nodes), billion-scale support, GPU-accelerated indexing
Weaknesses: Complex architecture (etcd + MinIO + Pulsar + multiple node types), steep learning curve


Core Metrics Comparison

Dimension pgvector Qdrant Milvus
Scale <5M rows 1M–100M 100M+
P99 Latency 10-100ms 1-10ms 5-50ms
Ops Complexity ★☆☆☆☆ ★★☆☆☆ ★★★★☆
Filter Queries Excellent (SQL) Strong Medium
Horizontal Scale Limited Medium (paid) Native
ACID Full Eventual Eventual

Decision Tree

How many vectors do you expect?
│
├─ < 1M
│   └─ Already using PostgreSQL? → YES: pgvector / NO: Qdrant
│
├─ 1M – 50M
│   └─ Complex filtering + joins? → YES: pgvector (tuned) or Qdrant
│                                  → NO: Qdrant (preferred)
│
└─ 50M+ / need horizontal scale → Milvus (only reasonable choice)
Enter fullscreen mode Exit fullscreen mode

Production War Stories

pgvector traps: HNSW params (m, ef_construction) are immutable post-creation; always add ORDER BY ... LIMIT N to avoid full scans; consider IVFFlat for dimensions > 2000

Qdrant traps: Collection vector dimension is immutable; payload indexes aren't auto-created; tune hnsw_config.m for your recall requirements

Milvus traps: Benchmark nlist/nprobe before production; compact() must be called manually to reclaim storage; never use Lite mode in production


Conclusion: There's No Best, Only Most Appropriate

  • pgvector: Default choice for PostgreSQL shops, fully sufficient under 5M vectors
  • Qdrant: Recommended for most standalone AI applications requiring high performance
  • Milvus: Industrial-grade solution for genuinely large-scale distributed scenarios

Selection principle: Build first, optimize later. Don't spin up a Milvus cluster today for data you might have three years from now.


References: pgvector 0.7.0 docs | Qdrant Official Benchmarks | Milvus 2.4 docs | Knowledge Card W12D5

Top comments (0)