DEV Community

Cover image for Pinecone vs Weaviate vs Milvus vs Qdrant: Which Vector DB in 2026?
Krunal Kanojiya
Krunal Kanojiya

Posted on

Pinecone vs Weaviate vs Milvus vs Qdrant: Which Vector DB in 2026?

I have helped a number of teams pick a vector database in the last year. The conversation always starts the same way: four logos, one Slack message, one question — which one?

The honest answer is that all four are good. The useful answer is that each one is built for a different set of constraints. Getting this decision wrong does not break your application on day one. It shows up six months later when your bill is three times your compute budget, or your filtered search recall starts degrading under load.

This post gives you the architecture overview, latency benchmarks, filtering quality comparison, hybrid search comparison, real cost numbers, and code examples for each. The full 4,000-word breakdown with detailed cost formulas and every edge case is at krunalkanojiya.com.


The Short Answer First

Skip the full read if you already know your constraints:

  • Pinecone — Fully managed, zero infrastructure, best for datasets under 10M vectors
  • Qdrant — Best filtering, native hybrid search, lowest cost at scale, best default for most RAG pipelines in 2026
  • Weaviate — Built-in vectorization, multi-modal, most mature BM25 + dense hybrid search
  • Milvus — Only real option above 100M vectors, GPU-accelerated indexing, needs Kubernetes

For everything else, keep reading.


Architecture and Deployment at a Glance

How each database handles deployment defines most of the tradeoffs that follow.

Pinecone is a black box. You get an API key. No infrastructure decisions, no index tuning, no ops. The 2026 default is serverless: pay per read unit, write unit, and storage. No idle charges.

Weaviate uses HNSW indexing with a modular architecture that lets you swap vectorizers, rerankers, and embedding models without rebuilding your schema. In April 2026, v1.37 shipped a native MCP Server, making it the first vector database where LLMs and agents can query and write directly without custom integration code.

Milvus is the most architecturally complex. It separates compute from storage at the infrastructure level with specialized node types (query nodes, data nodes, index nodes) on Kubernetes. Milvus 2.6 replaced its Kafka/Pulsar dependency with Woodpecker, its own WAL system built on object storage.

Qdrant is written in Rust and runs as a single binary or Docker container. No Kubernetes required for most workloads. v1.14 (April 2026) shipped GPU-accelerated HNSW indexing (4x faster index builds on AWS) and Multi-AZ clusters with 99.95% uptime SLAs.

Deployment complexity, simplest to hardest:
Pinecone (no ops) → Qdrant (single binary) → Weaviate (Docker Compose) → Milvus (Kubernetes)


Performance Benchmarks

Benchmarks from Salt Technologies AI's Vector Database Performance Benchmark 2026, covering 1M vectors at 1536 dimensions:

Database p50 Latency p99 Latency Notes
Qdrant (self-hosted) 4ms 8–12ms Lowest latency of all purpose-built vector DBs
Milvus (with GPU) 6ms 12–18ms CPU-only is 15–25ms
Pinecone Serverless 20–30ms 40–80ms (cold) Warm queries 10–15ms
Weaviate Cloud 50–70ms 100–150ms Self-hosted is significantly faster

A few things worth knowing about these numbers. Qdrant's 4ms p50 is consistent across warm and cold queries because it runs on infrastructure you control. Pinecone's cold query spike matters for applications with bursty or overnight-quiet traffic. Weaviate self-hosted with binary quantization reaches 20–40ms p99, much better than the managed numbers.


Filtering: The Most Underrated Difference

Filtering is the most important practical difference for production applications. Almost every real RAG query includes filters — user ID, date range, category, tenant ID.

The core problem: post-filtering retrieves candidates from the index first, then applies the filter. If only 1% of your vectors match the filter, the traversal wastes 99% of its work. Recall drops significantly under selective filters.

Qdrant applies the payload filter during HNSW graph traversal, not after. Recall stays high even when filters reduce the dataset to a tiny fraction of the total.

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    query=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(key="user_id", match=MatchValue(value=42)),
            FieldCondition(key="category", match=MatchValue(value="technical")),
            FieldCondition(
                key="created_at",
                range=Range(gte=1700000000)
            ),
        ]
    ),
    limit=10,
)
Enter fullscreen mode Exit fullscreen mode

Weaviate also applies filters during index traversal using an inverted index alongside the HNSW graph. Recall under selective filters is strong and comparable to Qdrant.

Milvus uses a scalar index built alongside the vector index. Handles complex filters well at billion-scale under high QPS through distributed query execution.

Pinecone handles metadata filtering well for common equality and range filters. Under highly selective filters on large datasets, recall can degrade. The filter syntax is also less expressive than the other three.

Filtering winner: Qdrant and Weaviate for correctness under selective filters. Milvus for filtering at billion-scale.


Hybrid Search

Hybrid search combines dense vectors with sparse BM25 to handle both semantic queries and exact keyword matches. It consistently outperforms pure semantic search for technical docs, product catalogs, and anything with specific identifiers like model numbers or proper nouns.

Weaviate has the most mature implementation. BM25 + dense vector search is processed in a single unified query, not two separate calls you merge in application code.

import weaviate
from weaviate.classes.query import HybridFusion

client = weaviate.connect_to_weaviate_cloud(
    cluster_url="YOUR_CLUSTER_URL",
    auth_credentials=weaviate.auth.AuthApiKey("YOUR_API_KEY"),
)

collection = client.collections.get("Documents")

results = collection.query.hybrid(
    query="vector database filtering performance",
    vector=query_embedding,
    fusion_type=HybridFusion.RELATIVE_SCORE,
    limit=10,
)

client.close()
Enter fullscreen mode Exit fullscreen mode

Qdrant supports native sparse vectors alongside dense vectors in named vector collections. Query them together with Reciprocal Rank Fusion.

from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, Prefetch, FusionQuery, Fusion

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    prefetch=[
        Prefetch(query=dense_vector, using="dense", limit=20),
        Prefetch(
            query=SparseVector(indices=sparse_indices, values=sparse_values),
            using="sparse",
            limit=20,
        ),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=10,
)
Enter fullscreen mode Exit fullscreen mode

Milvus 2.6 ships BM25 full-text search built-in, benchmarked at 400% higher throughput than Elasticsearch on equivalent hardware. Teams can collapse a two-system stack (Elasticsearch + vector DB) into a single Milvus deployment.

Pinecone supports hybrid search through sparse-dense vectors. Clean implementation, less flexible than Qdrant's named vector design when combining more than two vector types.

Hybrid search winner: Weaviate for the most integrated implementation. Qdrant for flexibility. Milvus for teams replacing Elasticsearch.


Real Cost Numbers at 10M Vectors

This is where teams get surprised the most. The number on the pricing page is rarely the number you pay in production.

Database 10M vectors managed 10M vectors self-hosted Best scenario
Pinecone Serverless $70–$100/month Not available Low query volume, small scale
Weaviate Cloud ~$135/month (BQ enabled) $80–$120/month Hybrid search with no extra cost
Milvus (Zilliz) $65–$120/month $60–$100/month Billion-scale with quantization
Qdrant Cloud ~$65/month $20–$40/month Best self-hosted economics

A few notes worth calling out.

Pinecone bills on read units, write units, and storage. At high query volume, read unit costs stack up fast. Weaviate's dimension-based billing multiplies by replication factor — enabling Binary Quantization is essential at any scale above 1M vectors or costs blow up. Milvus 2.6's RaBitQ 1-bit quantization compresses indexes to 1/32 their original size at 95% recall, which materially changes the hardware required at billion-scale. Self-hosted Qdrant on a $40/month VPS with binary quantization handles 10M vectors comfortably with sub-10ms p99.

Cost winner: Self-hosted Qdrant for lowest total cost at any scale. Pinecone Serverless for small workloads where zero-ops is worth the premium.


Quick Start Code for Each

Pinecone — fastest to get started, three lines to insert:

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("my-index")
index.upsert(vectors=[{"id": "doc-1", "values": embedding, "metadata": {"text": "..."}}])
results = index.query(vector=query_embedding, top_k=10, include_metadata=True)
Enter fullscreen mode Exit fullscreen mode

Qdrant — clean defaults, sane out-of-the-box behavior:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="http://localhost:6333")
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
client.upsert(
    collection_name="documents",
    points=[PointStruct(id=1, vector=embedding, payload={"content": "...", "user_id": 42})]
)
results = client.query_points(collection_name="documents", query=query_embedding, limit=10)
Enter fullscreen mode Exit fullscreen mode

The Decision Matrix

Requirement Pick
Zero infrastructure, fastest setup Pinecone
Best filtering recall under selective filters Qdrant
Native hybrid search out of the box Weaviate
Built-in vectorization (no separate embedding call) Weaviate
Dataset above 100M vectors Milvus
GPU-accelerated index builds Milvus or Qdrant Cloud
Lowest self-hosted cost Qdrant
Replace Elasticsearch with vector search Milvus
LLM agent integration via MCP Weaviate
Most teams building RAG in 2026 Qdrant

Where to Go Deeper

This post covers the key differences. The full comparison at my blog goes deeper on Pinecone's Dedicated Read Nodes and BYOC setup, Weaviate's Diversity Search using MMR, Milvus's tiered hot/cold storage, Qdrant's audit logging, the full cost formulas for each pricing model, and the Lambda vs Kappa architecture question for teams running both batch and streaming ingestion alongside vector search.

Read the full guide here:
Pinecone vs Weaviate vs Milvus vs Qdrant (2026): Full Comparison


Which vector database are you currently using in production? And what made you pick it? Drop it in the comments.

Top comments (0)