toolfreebie

Posted on May 28 • Originally published at toolfreebie.com

Qdrant vs Pinecone vs Chroma: Free Vector Database

#hosting #devops

Qdrant vs Pinecone vs Chroma: Free Vector Database for RAG

If you are building a retrieval-augmented generation (RAG) pipeline in 2026, the vector database is the load-bearing piece nobody talks about until it breaks. Embeddings are commoditised — Cohere, OpenAI, Voyage, and a dozen open models will turn your text into vectors for free or near-free. The harder question is where those vectors live, how fast you can search them, and how much you have to pay before the bill becomes scary.

Three names dominate the free end of that market: Qdrant, Pinecone, and Chroma. All three give you a real way to start a RAG project at zero cost. None of them require a credit card on day one. But they sit on fundamentally different points on the open-source-vs-managed and local-vs-cloud spectrums, and the right pick depends entirely on what you are building and how far you expect it to scale.

This guide compares all three on the metrics that actually matter for a free RAG stack — what the free tier really lets you do, what happens when you outgrow it, performance numbers from third-party benchmarks, and the engineering trade-offs that hit you a month into the project. Every number cited links back to the provider’s own docs, GitHub repo, or a public benchmark; nothing here is fabricated.

The 30-Second Answer

Database	Free path	License	Free ceiling	Best for
Qdrant	1 GB managed cloud cluster, free forever, no card	Apache 2.0	1 GB RAM + ~4 GB disk on managed; unlimited self-host	Production RAG with hybrid search, payload filters, no vendor lock-in
Pinecone	Starter plan: 2 GB storage, 5 indexes, no card	Closed-source SaaS	2 GB storage, 2M read units, 1M write units per month	Zero-ops managed RAG, fastest first-vector-to-production
Chroma	100% local — `pip install chromadb`	Apache 2.0	Bounded by your laptop’s RAM and disk	Local prototypes, notebooks, single-tenant desktop apps

If you want the smallest possible step from idea to working RAG with three lines of Python and no signup, Chroma wins. If you want a managed service that just exists at a URL with no servers to babysit, Pinecone is the easiest. If you want a real free tier that can carry a small production app, plus the option to self-host the exact same binary later when you outgrow it, Qdrant is the only one of the three with both at the same time.

The rest of this article unpacks why.

Why You Need a Vector Database for RAG at All

RAG, at its core, is one cheap trick: instead of stuffing your entire knowledge base into every LLM prompt, you embed your documents once, store the vectors, and at query time you embed the user’s question, look up the most similar document chunks by cosine similarity, and paste only those chunks into the prompt. The LLM never sees your full corpus — it only ever sees the few passages that matter for the current question.

This makes the vector-search step the bottleneck. Three properties decide whether your RAG app is good:

Recall: does the retriever actually return the relevant chunk? (Approximate-nearest-neighbour algorithms are tunable — you can trade speed for recall.)
Latency: how long does a single query take? If your RAG round trip is 800 ms before the LLM even starts streaming, the UX is dead.
Cost: how much do you pay per million vectors stored, per million queries served, and per million tokens re-embedded when you change models?

A flat-array brute-force search through Python lists works for ten thousand vectors. It falls over at a million. The vector databases below all use some flavour of HNSW (Hierarchical Navigable Small World) graphs to get sub-linear search complexity, plus a binary protocol that does not melt under load. The free tiers exist because every provider knows that the marginal cost of carrying a small project is rounding error, and the developer who built their hobby app on your stack is the developer who buys the production plan later.

What “Free” Actually Means in Vector Database Land

There are three meaningfully different shapes of “free” on offer:

Self-host open source: the code is Apache 2.0, you run it on your own hardware, you pay only for the box. Qdrant, Chroma, Weaviate, Milvus, and pgvector all live here. Free as in you do the work.
Managed free tier: a permanent free quota on the vendor’s own cloud, refilled monthly or capped at storage. Pinecone and Qdrant Cloud both offer this. Free as in they do the work, within limits.
Trial credits: a one-time wallet of paid-rate credit ($50–$300). Weaviate Cloud, Zilliz, and some others use this model. Useful for evaluation, not for shipping.

This guide focuses on the first two, because they are the only paths that let a real project keep running for free past the first month.

Qdrant: Open-Source Rust + Generous Managed Free Tier

Qdrant is a Rust-written vector database under the Apache 2.0 license. It is the rare project that gives you a credible production-grade open-source binary and a generous managed cloud free tier from the same team — which means you can prototype on the free cloud, migrate the exact same data to a self-hosted instance later, and never touch a different query language.

Free cloud cluster (no card)

The Qdrant Cloud free tier gives you one 1 GB cluster, free forever, with no credit card required. That is not a trial. It does not auto-convert to paid. The cluster is region-pinned, has full TLS, and exposes both REST and gRPC. You get:

1 GB RAM cluster (enough for roughly 1–3 million 384-dimensional vectors with default HNSW parameters)
Full HNSW indexing with all distance metrics (cosine, dot, Euclidean, Manhattan)
Payload filtering (Qdrant’s headline feature — filter by metadata during the ANN search, not after)
Hybrid search (dense + sparse vectors in the same query) since Qdrant 1.10
Snapshots, backups, monitoring dashboard

Self-hosting

One Docker command and you have a running Qdrant on your laptop or a VPS:

docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

That is the complete install. There is no separate metadata store, no Zookeeper, no Kafka. The binary is ~20 MB, the disk format is portable, and Qdrant ships an official REST + gRPC schema plus first-party clients for Python, JavaScript/TypeScript, Go, Rust, Java, and .NET.

Python in 10 lines

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="https://YOUR-CLUSTER.qdrant.io", api_key="...")
client.create_collection(
    "docs",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)
client.upsert("docs", points=[
    PointStruct(id=1, vector=[...1024 floats...], payload={"title": "Hello"}),
])
hits = client.search("docs", query_vector=[...1024 floats...], limit=5)

What pushes you off the free tier

Storage. One gigabyte is enough for a personal knowledge base, an internal company FAQ, or a side project’s documentation — but a SaaS that ingests user content will hit the ceiling fast. The next step is the Free Trial credit (currently $25) on a larger cluster, then paid tiers that start around $0.014/hour for a 4 GB cluster. Or you migrate to self-host.

Pinecone: The Managed-First Default

Pinecone was the first venture-funded managed vector database and remains the easiest one to get a production-shaped URL out of. The product is closed-source — you cannot run a Pinecone binary on your own hardware — but the trade-off is that you cannot break anything either. There is no cluster to size, no HNSW parameters to tune, no replicas to provision.

Starter plan free tier

The Pinecone Starter plan gives every account a permanent free allowance:

2 GB storage
5 serverless indexes
2 million read units per month
1 million write units per month
Up to 100 namespaces per index
No credit card required

The free tier is serverless — there are no nodes to pay for when idle. You pay (or use free units) per read and per write, where a read unit roughly equals a single small query and a write unit roughly equals one vector upserted. For a typical chatbot, 2 million read units is on the order of hundreds of thousands of user queries a month, which is more than enough for any prototype and many small production apps.

Python in 10 lines

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="...")
pc.create_index(
    name="docs",
    dimension=1024,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("docs")
index.upsert(vectors=[("doc-1", [...1024 floats...], {"title": "Hello"})])
hits = index.query(vector=[...1024 floats...], top_k=5, include_metadata=True)

What pushes you off the free tier

The first wall is usually concurrent users, not storage. A B2C app that does any meaningful traffic will burn through 2 million read units quickly, and once you exceed the monthly allowance the index is paused (Starter plan) or you pay overage (Standard plan starts at $50/month minimum). The second wall is features: namespaces above 100, hybrid search beyond serverless’s current support window, and on-prem deployment all push you to Enterprise.

Chroma: The Local-First Default

Chroma is the lightest possible vector database. It is also Apache 2.0, but its philosophy is the opposite of Pinecone’s: it expects to live inside your Python application as an embedded library, the way SQLite lives inside your application as a file. There is a server mode, but the default getting-started path is pip install chromadb and you have a working vector database in the same process as your script.

Free path

The local install is the free tier. There is no signup, no cluster, no API key — just a directory on disk where Chroma persists its DuckDB-backed storage. Chroma Cloud is in paid private preview as of late 2025, so for free-tier purposes Chroma is a pure self-host story.

Python in 5 lines

import chromadb

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("docs")
collection.add(ids=["doc-1"], documents=["Hello world"], metadatas=[{"src": "readme"}])
hits = collection.query(query_texts=["What is hello?"], n_results=5)

Note the API difference: Chroma can embed text for you using a default sentence-transformer (downloads on first use), so you can pass query_texts instead of pre-computed vectors. That is brilliant for prototypes and a footgun in production — the bundled embedder is small, English-only, and not what you want for a real product. For anything serious, plug in OpenAI, Cohere Embed v3, or a custom embedding function.

What pushes you off Chroma

Concurrency, scale, and operations. Chroma’s in-process mode is single-writer. Its server mode (chroma run) exists and works, but the operational story — backups, replication, monitoring, multi-region — is far less mature than Qdrant’s. Chroma is the best default for “I want a working RAG demo in five minutes” and “I want a local notebook to find similar items in my CSV.” It becomes a liability the moment you have ten concurrent users hitting the same index from a deployed web app.

Head-to-Head: Free Tier Limits Compared

Limit	Qdrant Cloud Free	Pinecone Starter	Chroma (Local)
Storage	1 GB RAM (~1–3M vectors at 384d)	2 GB	Your disk
Indexes / collections	Multiple in 1 cluster	5 indexes	Unlimited (your file system)
Reads per month	No hard cap (RAM-bound)	2 M read units	Unlimited (CPU-bound)
Writes per month	No hard cap	1 M write units	Unlimited
Hybrid (dense + sparse)	Yes	Partial (sparse-dense indexes, region-limited)	No (dense only)
Metadata filtering during ANN	Yes (payload filter inside HNSW walk)	Yes	Yes (post-filter)
Persistence	Cloud-managed	Cloud-managed	Local DuckDB / SQLite
Backups	Snapshots	Collection backups	Copy the directory
Self-host option	Yes (Apache 2.0)	No	Yes (Apache 2.0)
Credit card to start	No	No	No (no account needed)

Two things jump out. First, Chroma does not really compete on the same axis — it is a library, not a service. Second, between the two services, Qdrant’s free tier is the only one whose cap is storage only, not query volume. Pinecone will pause your index if you blow the read-unit budget. Qdrant Cloud will simply slow down if you saturate the 1 GB cluster, but the queries keep flowing.

Performance: What the Public Benchmarks Say

The vector-database performance picture changes every quarter, and most vendor benchmarks are theatre. Two public third-party datasets are worth looking at:

The Qdrant vector-db-benchmark repo — open-source, reproducible, runs every major engine through the same ANN-Benchmarks dataset with default and tuned configurations. Yes, it is published by Qdrant, but the harness is open and you can re-run it. Qdrant generally tops latency and RPS in their published runs; Chroma is not in the comparison set because it is single-node.
The ann-benchmarks.com leaderboard — the canonical academic benchmark for ANN libraries (not full databases), useful for comparing the underlying index algorithms (HNSW, IVF, ScaNN).

For a small free-tier project, the takeaway is that all three engines will return a top-5 query under 50 ms with healthy recall at the dataset sizes you can actually fit in their free quotas. Latency-per-dollar starts to matter at higher scale; at the free tier, pick on developer experience and lock-in, not p99 by 5 ms.

Embedding Compatibility

None of these databases generate embeddings on their own (Chroma’s default model aside). You bring vectors in, and the database stores and searches them. That means your embedding choice is independent — and worth thinking about, because the bill on embeddings can dwarf the bill on the vector DB itself.

Embedder	Dimension	Free tier	Plays well with
Cohere Embed v3	1024 (or 384 light)	Trial key, no card	Multilingual RAG, +Rerank in one stack
OpenAI text-embedding-3-small	1536 (or shrinkable)	Pay-as-you-go ($0.02/1M tokens)	Ubiquitous defaults, every library supports it
Voyage AI voyage-3-lite	512	$50 trial credit	Lowest latency, strong on code
BGE / E5 (open source)	varies	Free (self-host)	Air-gapped deployments, zero per-token cost
Sentence-Transformers (open source)	384 / 768	Free (self-host)	Local notebooks, Chroma’s default

All three vector databases accept any of these; they are agnostic about where the vectors came from as long as the dimension matches what you declared at index creation.

When to Choose Which: Decision Tree

You want a notebook-based RAG demo today, with no signup. → Chroma. pip install chromadb, three lines, done. Move on.
You are building a real product and want managed infrastructure with zero ops. → Pinecone. The starter plan covers prototypes, the upgrade path is clean, the docs are the best in the category. You pay the price of vendor lock-in.
You want a real free tier you can leave running, with an exit door to self-host when traffic grows. → Qdrant. The 1 GB cloud cluster carries a small production app, and when you outgrow it the migration to a self-hosted Docker container is one snapshot restore away.
You need hybrid search (BM25 + dense) without paying for a premium tier. → Qdrant. It is the only one of the three that ships full sparse-dense hybrid in its free tier.
You need to filter by tens of metadata fields during retrieval. → Qdrant. Payload filtering happens inside the HNSW walk, not as a post-filter, which preserves recall when the filter is selective.
You are deploying to a customer’s air-gapped environment. → Qdrant or Chroma. Pinecone is not an option here.
Your team has zero appetite for running a database. → Pinecone. The serverless model is the closest thing to “vector DB as an HTTP function” in the market.

The Self-Host vs Managed Trade-Off

This is the question that decides 80% of the choice between Qdrant/Chroma and Pinecone. Self-hosting is free in money and expensive in attention. A small VPS — Oracle Cloud’s always-free ARM tier gives you four cores and 24 GB of RAM for $0 forever — can comfortably run Qdrant or Chroma serving a small RAG app, and the marginal cost of growth is just whatever extra RAM you buy.

What self-hosting does not give you for free is:

Automatic snapshot-and-restore on a schedule you trust
Multi-region replication for HA
An on-call rotation when the disk fills up at 3 a.m.
A vendor support contract when something subtle breaks

For a hobby app or an MVP, those things do not matter — the cost of an outage is your own time. For anything with revenue attached, the managed option starts to look cheap. Qdrant’s strength is that the same query interface works on both, so the migration story is straightforward when the project’s stakes change.

Integration with LangChain, LlamaIndex, and the LLM Layer

All three databases have first-class connectors in the major orchestration libraries — there is no reason to pick on integration coverage:

LangChain: langchain-qdrant, langchain-pinecone, langchain-chroma are all official packages with active maintenance.
LlamaIndex: Same story — QdrantVectorStore, PineconeVectorStore, ChromaVectorStore all live in the core repo or first-party plugins.
Haystack, LlamaCpp, Semantic Kernel: All three databases are first-tier choices.

On the LLM side, the vector database is independent of the model you use to generate answers. Free-tier RAG stacks I see most often in 2026:

Embeddings: Cohere Embed v3 (free trial key)
Reranker: Cohere Rerank v3 (same key)
Vector store: Qdrant Cloud free or local Chroma
LLM: Groq Llama 3.3, Gemini 2.5 Flash, or Together AI’s free model tier

That entire pipeline costs $0 up to the point where any single component’s free quota runs out, which for most personal projects is essentially never.

FAQ

Is pgvector a better choice than these three?

If you already run PostgreSQL and your collection fits in a single Postgres box, pgvector is a serious option — one fewer service to operate, transactional consistency with your other tables, mature backups. It loses to Qdrant on filtering performance at scale and on hybrid search, and it tops out earlier on throughput. For a RAG project where Postgres is already in the stack, start there. For a new project, the specialised databases are easier to reason about.

What about Weaviate, Milvus, Zilliz, Vespa?

All worth knowing. Weaviate has the most ambitious built-in module system (it ships its own embedders, rerankers, multi-tenancy, generative search), but the managed free tier is a 14-day trial, not permanent. Milvus is the heavyweight open-source choice for hundred-million-vector deployments; overkill for a starting project. Zilliz is the managed Milvus, with a serverless free tier that competes with Pinecone. Vespa is Yahoo’s open-source search engine that also does vectors well, and is the right pick if you need full text + vectors + structured filters at search-engine scale. For free-tier RAG, the three covered here are the most popular for a reason — they have the lowest activation energy.

Can I use these databases without an LLM at all?

Yes — vector databases are useful any time you have items and want similarity search. Recommendation systems, semantic search across product catalogues, duplicate detection, image similarity (with image embeddings), code search. RAG is the headline use case but not the only one.

How big does my vector index have to be before I need a real database?

Rule of thumb: under 100 K vectors, a flat numpy array with cosine similarity is faster than any database and zero ops. From 100 K to a few million, an in-process library like Chroma or FAISS is fine. Past 10 M vectors, you want a real database with persistence, snapshots, and a binary protocol — Qdrant, Pinecone, or Weaviate. The crossover is fuzzy; the gradient is real.

Do I need to re-embed everything when I change my embedding model?

Yes. Embeddings from different models are not interoperable — a query vector from OpenAI cannot be searched against documents embedded with Cohere. This is the single biggest hidden cost of RAG. When you change embedding models, you re-embed your entire corpus, which is also a re-write of every vector in the database. Plan migrations.

What is a “write unit” or “read unit” in Pinecone’s pricing?

Pinecone’s serverless billing splits operations into read units and write units, where one read unit roughly equals one similarity query that returns up to 10 results from a small index, and one write unit roughly equals one vector upserted. The actual conversion depends on index size and result count — the Pinecone docs have the exact formula. For most chatbot workloads, 2 M read units a month covers far more queries than you would expect.

DEV Community

Qdrant vs Pinecone vs Chroma: Free Vector Database

Qdrant vs Pinecone vs Chroma: Free Vector Database for RAG

The 30-Second Answer

Why You Need a Vector Database for RAG at All

What “Free” Actually Means in Vector Database Land

Qdrant: Open-Source Rust + Generous Managed Free Tier

Free cloud cluster (no card)

Self-hosting

Python in 10 lines

What pushes you off the free tier

Pinecone: The Managed-First Default

Starter plan free tier

Python in 10 lines

What pushes you off the free tier

Chroma: The Local-First Default

Free path

Python in 5 lines

What pushes you off Chroma

Head-to-Head: Free Tier Limits Compared

Performance: What the Public Benchmarks Say

Embedding Compatibility

When to Choose Which: Decision Tree

The Self-Host vs Managed Trade-Off

Integration with LangChain, LlamaIndex, and the LLM Layer

FAQ

Is pgvector a better choice than these three?

What about Weaviate, Milvus, Zilliz, Vespa?

Can I use these databases without an LLM at all?

How big does my vector index have to be before I need a real database?

Do I need to re-embed everything when I change my embedding model?

What is a “write unit” or “read unit” in Pinecone’s pricing?

Related Reads

Top comments (0)