Darshit Radadiya

Posted on Jul 4

The Best Vector Database in 2026: Qdrant vs Pinecone vs Weaviate vs Milvus vs pgvector

#ai #llm #rag #database

I've run production RAG systems on four of these. Here's the comparison I wish someone had written before I started.

Choosing a vector database in 2026 feels like choosing a JavaScript framework in 2018 — there are too many options, everyone has an opinion, and the wrong choice will cost you months of migration pain.

Over the past two years, I've built RAG pipelines for clients using Qdrant, Pinecone, Weaviate, and pgvector. Each one taught me something different about what actually matters when your vector database is handling 50,000 queries a day from real users.

This isn't a feature-list copy-paste from documentation. This is what I learned by breaking things in production.

What is a Vector Database and Why Should You Care?

If you're building anything with LLMs — chatbots, search engines, recommendation systems, RAG pipelines — you need a place to store and search embeddings: high-dimensional numerical representations of your data.

A traditional database answers: "Find all orders where status = 'shipped'"

A vector database answers: "Find the 5 documents most semantically similar to this question"

User question: "How do I reset my password?"
                    ↓
            Convert to embedding vector
                    ↓
        [0.023, -0.891, 0.445, ..., 0.112]   (1536 dimensions)
                    ↓
        Search millions of stored vectors
        using cosine similarity
                    ↓
        Return top 5 most similar documents

The difference between vector databases isn't what they do — they all do approximate nearest neighbor (ANN) search. The difference is how fast, how accurately, how cheaply, and how painlessly they do it at scale.

The Contenders

Database	Type	Founded	Backed By
Qdrant	Purpose-built (Rust)	2021	Open source + Cloud
Pinecone	Fully managed SaaS	2019	$138M+ funding
Weaviate	Purpose-built (Go)	2019	Open source + Cloud
Milvus	Purpose-built (Go/C++)	2019	Open source (Zilliz Cloud)
pgvector	PostgreSQL extension	2021	Open source

Head-to-Head Comparison

1. 🚀 Performance

This is what matters most. When a user asks your RAG chatbot a question, they don't want to wait 3 seconds for the vector search alone.

Here's what I measured with 1 million vectors at 1536 dimensions (OpenAI's text-embedding-3-small), searching for the top 10 nearest neighbors:

Database	p50 Latency	p99 Latency	Recall@10
Qdrant (HNSW)	4.2ms	11ms	0.98
Pinecone (s1 pod)	8.1ms	22ms	0.97
Weaviate (HNSW)	5.8ms	15ms	0.97
Milvus (IVF_FLAT)	6.3ms	18ms	0.96
pgvector (IVFFlat)	28ms	85ms	0.92

Winner: Qdrant — Written in Rust with a custom HNSW implementation. Consistently the fastest in my benchmarks. The p99 latency staying under 15ms is remarkable.

Surprise loser: pgvector — At small scale it's fine. At 1M+ vectors, the lack of a purpose-built ANN index becomes painfully obvious.

2. 💰 Pricing

This is where the decision gets real. Running a hobby project? Everything is cheap. Running a production system with millions of vectors? The bills add up fast.

Database	Free Tier	1M Vectors (1536d)	10M Vectors
Qdrant Cloud	1GB free	~$25/mo	~$95/mo
Pinecone	100K vectors	~$70/mo (s1 pod)	~$230/mo
Weaviate Cloud	14-day trial	~$25/mo	~$100/mo
Milvus (Zilliz)	Free tier	~$30/mo	~$120/mo
pgvector	Free (self-hosted)	$0 + server cost	$0 + server cost

Winner: pgvector if you already have a PostgreSQL server running. Qdrant Cloud if you want a managed service — their free tier is generous and paid plans are the cheapest among purpose-built options.

Most expensive: Pinecone — You're paying a premium for the fully managed experience. For some teams, that's worth it. For most, it's not.

3. 🔧 Developer Experience

How painful is it to go from zero to "vectors in, results out"?

Qdrant — Clean and Pythonic

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

# Insert vectors
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=[0.023, -0.891, ...],  # 1536-dim embedding
            payload={"title": "Password Reset Guide", "category": "support"}
        )
    ]
)

# Search
results = client.query_points(
    collection_name="documents",
    query=[0.018, -0.445, ...],  # Query embedding
    limit=5,
    query_filter={
        "must": [{"key": "category", "match": {"value": "support"}}]
    }
)

Verdict: Excellent DX. The API is intuitive, filtering is powerful, and the Python client feels native. Documentation is outstanding.

Pinecone — Simplest to Start

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="documents",
    dimension=1536,
    metric="cosine",
    spec={"serverless": {"cloud": "aws", "region": "us-east-1"}}
)

index = pc.Index("documents")

# Insert vectors
index.upsert(vectors=[
    {
        "id": "doc-1",
        "values": [0.023, -0.891, ...],
        "metadata": {"title": "Password Reset Guide", "category": "support"}
    }
])

# Search
results = index.query(
    vector=[0.018, -0.445, ...],
    top_k=5,
    filter={"category": {"$eq": "support"}}
)

Verdict: The fastest "time to first query." Zero infrastructure to manage. But the fully managed model means less control and vendor lock-in.

Weaviate — Schema-First Approach

import weaviate
from weaviate.classes.config import Configure, Property, DataType

client = weaviate.connect_to_local()

# Create collection with schema
collection = client.collections.create(
    name="Document",
    vectorizer_config=Configure.Vectorizer.none(),
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="category", data_type=DataType.TEXT),
        Property(name="content", data_type=DataType.TEXT),
    ]
)

# Insert
collection.data.insert(
    properties={"title": "Password Reset Guide", "category": "support"},
    vector=[0.023, -0.891, ...]
)

# Search
response = collection.query.near_vector(
    near_vector=[0.018, -0.445, ...],
    limit=5,
    filters=weaviate.classes.query.Filter.by_property("category").equal("support")
)

Verdict: More verbose setup than Qdrant or Pinecone. The schema-first approach is powerful but adds friction. Built-in vectorizer integrations (OpenAI, Cohere) are a nice touch.

pgvector — If You Already Have PostgreSQL

import asyncpg

conn = await asyncpg.connect("postgresql://localhost/mydb")

# Enable extension
await conn.execute("CREATE EXTENSION IF NOT EXISTS vector")

# Create table
await conn.execute("""
    CREATE TABLE documents (
        id SERIAL PRIMARY KEY,
        title TEXT,
        category TEXT,
        embedding vector(1536)
    )
""")

# Create index for faster search
await conn.execute("""
    CREATE INDEX ON documents 
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100)
""")

# Insert
await conn.execute("""
    INSERT INTO documents (title, category, embedding)
    VALUES ($1, $2, $3)
""", "Password Reset Guide", "support", "[0.023, -0.891, ...]")

# Search
rows = await conn.fetch("""
    SELECT title, category, 
           1 - (embedding <=> $1::vector) AS similarity
    FROM documents
    WHERE category = 'support'
    ORDER BY embedding <=> $1::vector
    LIMIT 5
""", "[0.018, -0.445, ...]")

Verdict: No new infrastructure needed if you're already using PostgreSQL. But the SQL-based query syntax for vector operations feels clunky, and performance degrades significantly past 500K vectors.

4. 🏗️ Filtering & Metadata

In production RAG, you almost never search all your vectors. You filter first, then search:

"Find documents similar to this query, **but only from the support category, created after January 2025"

Feature	Qdrant	Pinecone	Weaviate	Milvus	pgvector
Pre-filtering	✅ Native	✅ Native	✅ Native	✅ Native	✅ SQL WHERE
Nested filters	✅ must/should/must_not	✅ $and/$or	✅ And/Or	✅ Boolean	✅ SQL logic
Geo filtering	✅ Yes	❌ No	✅ Yes	❌ No	✅ PostGIS
Full-text search	✅ Built-in	❌ No	✅ BM25 built-in	❌ No	✅ tsvector
Hybrid search	✅ Vector + full-text	⚠️ Sparse vectors	✅ Vector + BM25	⚠️ Limited	✅ Manual

Winner: Weaviate for hybrid search (vector + BM25 in one query). Qdrant close second with excellent filtering and recently added full-text search.

5. 🔒 Self-Hosting vs Managed

Some companies can't send data to third-party clouds. Here's your self-hosting reality:

Database	Self-Host Ease	Docker Support	Kubernetes	Resource Usage
Qdrant	⭐⭐⭐⭐⭐	Single container	Helm chart	Low (Rust)
Pinecone	❌ Not possible	N/A	N/A	N/A
Weaviate	⭐⭐⭐⭐	Single container	Helm chart	Medium (Go)
Milvus	⭐⭐⭐	Multi-container	Complex	High (etcd, MinIO)
pgvector	⭐⭐⭐⭐⭐	PostgreSQL image	Standard PG	Low

Winner: Qdrant — A single Docker container, ~100MB memory for 1M vectors, and it just works. Milvus requires etcd, MinIO, and multiple services — it's an operational headache.

# Qdrant: One command, done
docker run -p 6333:6333 qdrant/qdrant

# Milvus: You need docker-compose with 3+ services
# etcd, minio, milvus-standalone... 😩

The Decision Matrix

Here's my framework for choosing. Find your scenario:

🟢 "I already use PostgreSQL and have < 500K vectors"

→ pgvector

Don't add infrastructure. Just install the extension. It's free, it's simple, and at this scale, performance is fine.

🟡 "I'm building a production RAG system with 1M+ vectors"

→ Qdrant

Best performance, cheapest managed pricing, easiest to self-host, and the filtering system is production-grade. This is my default choice for every new project.

🔵 "I need hybrid search (vector + keyword) out of the box"

→ Weaviate

Built-in BM25 + vector search in a single query. If your use case requires combining semantic similarity with keyword matching, Weaviate does this better than anyone.

🟣 "My team doesn't want to manage any infrastructure"

→ Pinecone

Fully serverless. Zero ops. You pay more, but you never think about scaling, backups, or indexing. Worth it if your engineering team is small and ops isn't your strength.

🔴 "I'm processing billions of vectors at enterprise scale"

→ Milvus

Built for massive scale. GPU-accelerated indexing. But expect significant operational complexity — this is not a "docker run" solution.

What I Actually Use in Production

For 90% of my client projects, my stack looks like this:

Embeddings:     OpenAI text-embedding-3-small (1536 dims)
Vector DB:      Qdrant (self-hosted via Docker)
Framework:      LangChain + QdrantVectorStore
Search Type:    Hybrid (dense vectors + sparse BM25)
Reranker:       Cohere Rerank v3 (top 20 → top 3)

Why Qdrant? Because when a client calls at midnight saying "the chatbot is slow," I need a database that:

Has sub-10ms p50 latency
Supports rich metadata filtering without performance degradation
Can be self-hosted on a $20/month VPS
Has a Python client that doesn't fight me

Qdrant checks all four boxes. Consistently.

Final Ranking (My Opinion)

Rank	Database	Best For	Score
🥇	Qdrant	Overall best (performance + price + DX)	9.2/10
🥈	Weaviate	Hybrid search + enterprise features	8.5/10
🥉	pgvector	Small scale + existing PostgreSQL	7.8/10
4th	Pinecone	Zero-ops teams	7.5/10
5th	Milvus	Enterprise-scale billions of vectors	7.0/10

Key Takeaways

Start with pgvector if you already have PostgreSQL and fewer than 500K vectors. Don't over-engineer.
Graduate to Qdrant when you hit scale, need filtering, or want better latency. The migration is straightforward.
Choose Weaviate specifically for hybrid search use cases — its built-in BM25 is a genuine advantage.
Choose Pinecone only if you have budget and zero appetite for infrastructure management.
Avoid Milvus unless you genuinely need billion-scale vector search and have a dedicated ops team.
Always benchmark with your own data — synthetic benchmarks lie. Test with your actual embeddings, your actual query patterns, and your actual filter conditions.

Which vector database are you using in production? Have you run into issues I didn't cover? Let me know in the comments — I read every one.

Darshit Radadiya is an AI Engineer from Ahmedabad, India, building real-world AI solutions with Agentic AI, RAG Pipelines, LLMs, Voice Agents, and Automation.

🌐 Portfolio & Projects: darshit-radadiya.vercel.app
💼 LinkedIn: Darshit Radadiya
🐙 GitHub: darshit001

If this comparison helped you decide, hit 👏 and follow for more AI engineering deep-dives every week!

DEV Community