Krunal Kanojiya

Posted on Jun 28

Pinecone vs Weaviate vs Milvus vs Qdrant: Which Vector DB in 2026?

#database #vectordatabase #machinelearning #python

I have helped a number of teams pick a vector database in the last year. The conversation always starts the same way: four logos, one Slack message, one question — which one?

The honest answer is that all four are good. The useful answer is that each one is built for a different set of constraints. Getting this decision wrong does not break your application on day one. It shows up six months later when your bill is three times your compute budget, or your filtered search recall starts degrading under load.

This post gives you the architecture overview, latency benchmarks, filtering quality comparison, hybrid search comparison, real cost numbers, and code examples for each. The full 4,000-word breakdown with detailed cost formulas and every edge case is at krunalkanojiya.com.

The Short Answer First

Skip the full read if you already know your constraints:

Pinecone — Fully managed, zero infrastructure, best for datasets under 10M vectors
Qdrant — Best filtering, native hybrid search, lowest cost at scale, best default for most RAG pipelines in 2026
Weaviate — Built-in vectorization, multi-modal, most mature BM25 + dense hybrid search
Milvus — Only real option above 100M vectors, GPU-accelerated indexing, needs Kubernetes

For everything else, keep reading.

Architecture and Deployment at a Glance

How each database handles deployment defines most of the tradeoffs that follow.

Pinecone is a black box. You get an API key. No infrastructure decisions, no index tuning, no ops. The 2026 default is serverless: pay per read unit, write unit, and storage. No idle charges.

Weaviate uses HNSW indexing with a modular architecture that lets you swap vectorizers, rerankers, and embedding models without rebuilding your schema. In April 2026, v1.37 shipped a native MCP Server, making it the first vector database where LLMs and agents can query and write directly without custom integration code.

Milvus is the most architecturally complex. It separates compute from storage at the infrastructure level with specialized node types (query nodes, data nodes, index nodes) on Kubernetes. Milvus 2.6 replaced its Kafka/Pulsar dependency with Woodpecker, its own WAL system built on object storage.

Qdrant is written in Rust and runs as a single binary or Docker container. No Kubernetes required for most workloads. v1.14 (April 2026) shipped GPU-accelerated HNSW indexing (4x faster index builds on AWS) and Multi-AZ clusters with 99.95% uptime SLAs.

Deployment complexity, simplest to hardest:
Pinecone (no ops) → Qdrant (single binary) → Weaviate (Docker Compose) → Milvus (Kubernetes)

Performance Benchmarks

Benchmarks from Salt Technologies AI's Vector Database Performance Benchmark 2026, covering 1M vectors at 1536 dimensions:

Database	p50 Latency	p99 Latency	Notes
Qdrant (self-hosted)	4ms	8–12ms	Lowest latency of all purpose-built vector DBs
Milvus (with GPU)	6ms	12–18ms	CPU-only is 15–25ms
Pinecone Serverless	20–30ms	40–80ms (cold)	Warm queries 10–15ms
Weaviate Cloud	50–70ms	100–150ms	Self-hosted is significantly faster

A few things worth knowing about these numbers. Qdrant's 4ms p50 is consistent across warm and cold queries because it runs on infrastructure you control. Pinecone's cold query spike matters for applications with bursty or overnight-quiet traffic. Weaviate self-hosted with binary quantization reaches 20–40ms p99, much better than the managed numbers.

Filtering: The Most Underrated Difference

Filtering is the most important practical difference for production applications. Almost every real RAG query includes filters — user ID, date range, category, tenant ID.

The core problem: post-filtering retrieves candidates from the index first, then applies the filter. If only 1% of your vectors match the filter, the traversal wastes 99% of its work. Recall drops significantly under selective filters.

Qdrant applies the payload filter during HNSW graph traversal, not after. Recall stays high even when filters reduce the dataset to a tiny fraction of the total.

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    query=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(key="user_id", match=MatchValue(value=42)),
            FieldCondition(key="category", match=MatchValue(value="technical")),
            FieldCondition(
                key="created_at",
                range=Range(gte=1700000000)
            ),
        ]
    ),
    limit=10,
)

Weaviate also applies filters during index traversal using an inverted index alongside the HNSW graph. Recall under selective filters is strong and comparable to Qdrant.

Milvus uses a scalar index built alongside the vector index. Handles complex filters well at billion-scale under high QPS through distributed query execution.

Pinecone handles metadata filtering well for common equality and range filters. Under highly selective filters on large datasets, recall can degrade. The filter syntax is also less expressive than the other three.

Filtering winner: Qdrant and Weaviate for correctness under selective filters. Milvus for filtering at billion-scale.

Hybrid Search

Hybrid search combines dense vectors with sparse BM25 to handle both semantic queries and exact keyword matches. It consistently outperforms pure semantic search for technical docs, product catalogs, and anything with specific identifiers like model numbers or proper nouns.

Weaviate has the most mature implementation. BM25 + dense vector search is processed in a single unified query, not two separate calls you merge in application code.

import weaviate
from weaviate.classes.query import HybridFusion

client = weaviate.connect_to_weaviate_cloud(
    cluster_url="YOUR_CLUSTER_URL",
    auth_credentials=weaviate.auth.AuthApiKey("YOUR_API_KEY"),
)

collection = client.collections.get("Documents")

results = collection.query.hybrid(
    query="vector database filtering performance",
    vector=query_embedding,
    fusion_type=HybridFusion.RELATIVE_SCORE,
    limit=10,
)

client.close()

Qdrant supports native sparse vectors alongside dense vectors in named vector collections. Query them together with Reciprocal Rank Fusion.

from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, Prefetch, FusionQuery, Fusion

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    prefetch=[
        Prefetch(query=dense_vector, using="dense", limit=20),
        Prefetch(
            query=SparseVector(indices=sparse_indices, values=sparse_values),
            using="sparse",
            limit=20,
        ),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=10,
)

Milvus 2.6 ships BM25 full-text search built-in, benchmarked at 400% higher throughput than Elasticsearch on equivalent hardware. Teams can collapse a two-system stack (Elasticsearch + vector DB) into a single Milvus deployment.

Pinecone supports hybrid search through sparse-dense vectors. Clean implementation, less flexible than Qdrant's named vector design when combining more than two vector types.

Hybrid search winner: Weaviate for the most integrated implementation. Qdrant for flexibility. Milvus for teams replacing Elasticsearch.

Real Cost Numbers at 10M Vectors

This is where teams get surprised the most. The number on the pricing page is rarely the number you pay in production.

Database	10M vectors managed	10M vectors self-hosted	Best scenario
Pinecone Serverless	$70–$100/month	Not available	Low query volume, small scale
Weaviate Cloud	~$135/month (BQ enabled)	$80–$120/month	Hybrid search with no extra cost
Milvus (Zilliz)	$65–$120/month	$60–$100/month	Billion-scale with quantization
Qdrant Cloud	~$65/month	$20–$40/month	Best self-hosted economics

A few notes worth calling out.

Pinecone bills on read units, write units, and storage. At high query volume, read unit costs stack up fast. Weaviate's dimension-based billing multiplies by replication factor — enabling Binary Quantization is essential at any scale above 1M vectors or costs blow up. Milvus 2.6's RaBitQ 1-bit quantization compresses indexes to 1/32 their original size at 95% recall, which materially changes the hardware required at billion-scale. Self-hosted Qdrant on a $40/month VPS with binary quantization handles 10M vectors comfortably with sub-10ms p99.

Cost winner: Self-hosted Qdrant for lowest total cost at any scale. Pinecone Serverless for small workloads where zero-ops is worth the premium.

Quick Start Code for Each

Pinecone — fastest to get started, three lines to insert:

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("my-index")
index.upsert(vectors=[{"id": "doc-1", "values": embedding, "metadata": {"text": "..."}}])
results = index.query(vector=query_embedding, top_k=10, include_metadata=True)

Qdrant — clean defaults, sane out-of-the-box behavior:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
client.upsert(
    collection_name="documents",
    points=[PointStruct(id=1, vector=embedding, payload={"content": "...", "user_id": 42})]
)
results = client.query_points(collection_name="documents", query=query_embedding, limit=10)

The Decision Matrix

Requirement	Pick
Zero infrastructure, fastest setup	Pinecone
Best filtering recall under selective filters	Qdrant
Native hybrid search out of the box	Weaviate
Built-in vectorization (no separate embedding call)	Weaviate
Dataset above 100M vectors	Milvus
GPU-accelerated index builds	Milvus or Qdrant Cloud
Lowest self-hosted cost	Qdrant
Replace Elasticsearch with vector search	Milvus
LLM agent integration via MCP	Weaviate
Most teams building RAG in 2026	Qdrant

Where to Go Deeper

This post covers the key differences. The full comparison at my blog goes deeper on Pinecone's Dedicated Read Nodes and BYOC setup, Weaviate's Diversity Search using MMR, Milvus's tiered hot/cold storage, Qdrant's audit logging, the full cost formulas for each pricing model, and the Lambda vs Kappa architecture question for teams running both batch and streaming ingestion alongside vector search.

Read the full guide here:
Pinecone vs Weaviate vs Milvus vs Qdrant (2026): Full Comparison

Which vector database are you currently using in production? And what made you pick it? Drop it in the comments.