Qdrant vs Pinecone vs Chroma: Free Vector Database for RAG
If you are building a retrieval-augmented generation (RAG) pipeline in 2026, the vector database is the load-bearing piece nobody talks about until it breaks. Embeddings are commoditised — Cohere, OpenAI, Voyage, and a dozen open models will turn your text into vectors for free or near-free. The harder question is where those vectors live, how fast you can search them, and how much you have to pay before the bill becomes scary.
Three names dominate the free end of that market: Qdrant, Pinecone, and Chroma. All three give you a real way to start a RAG project at zero cost. None of them require a credit card on day one. But they sit on fundamentally different points on the open-source-vs-managed and local-vs-cloud spectrums, and the right pick depends entirely on what you are building and how far you expect it to scale.
This guide compares all three on the metrics that actually matter for a free RAG stack — what the free tier really lets you do, what happens when you outgrow it, performance numbers from third-party benchmarks, and the engineering trade-offs that hit you a month into the project. Every number cited links back to the provider’s own docs, GitHub repo, or a public benchmark; nothing here is fabricated.
The 30-Second Answer
| Database | Free path | License | Free ceiling | Best for |
|---|---|---|---|---|
| Qdrant | 1 GB managed cloud cluster, free forever, no card | Apache 2.0 | 1 GB RAM + ~4 GB disk on managed; unlimited self-host | Production RAG with hybrid search, payload filters, no vendor lock-in |
| Pinecone | Starter plan: 2 GB storage, 5 indexes, no card | Closed-source SaaS | 2 GB storage, 2M read units, 1M write units per month | Zero-ops managed RAG, fastest first-vector-to-production |
| Chroma | 100% local — pip install chromadb
|
Apache 2.0 | Bounded by your laptop’s RAM and disk | Local prototypes, notebooks, single-tenant desktop apps |
If you want the smallest possible step from idea to working RAG with three lines of Python and no signup, Chroma wins. If you want a managed service that just exists at a URL with no servers to babysit, Pinecone is the easiest. If you want a real free tier that can carry a small production app, plus the option to self-host the exact same binary later when you outgrow it, Qdrant is the only one of the three with both at the same time.
The rest of this article unpacks why.
Why You Need a Vector Database for RAG at All
RAG, at its core, is one cheap trick: instead of stuffing your entire knowledge base into every LLM prompt, you embed your documents once, store the vectors, and at query time you embed the user’s question, look up the most similar document chunks by cosine similarity, and paste only those chunks into the prompt. The LLM never sees your full corpus — it only ever sees the few passages that matter for the current question.
This makes the vector-search step the bottleneck. Three properties decide whether your RAG app is good:
- Recall: does the retriever actually return the relevant chunk? (Approximate-nearest-neighbour algorithms are tunable — you can trade speed for recall.)
- Latency: how long does a single query take? If your RAG round trip is 800 ms before the LLM even starts streaming, the UX is dead.
- Cost: how much do you pay per million vectors stored, per million queries served, and per million tokens re-embedded when you change models?
A flat-array brute-force search through Python lists works for ten thousand vectors. It falls over at a million. The vector databases below all use some flavour of HNSW (Hierarchical Navigable Small World) graphs to get sub-linear search complexity, plus a binary protocol that does not melt under load. The free tiers exist because every provider knows that the marginal cost of carrying a small project is rounding error, and the developer who built their hobby app on your stack is the developer who buys the production plan later.
What “Free” Actually Means in Vector Database Land
There are three meaningfully different shapes of “free” on offer:
- Self-host open source: the code is Apache 2.0, you run it on your own hardware, you pay only for the box. Qdrant, Chroma, Weaviate, Milvus, and pgvector all live here. Free as in you do the work.
- Managed free tier: a permanent free quota on the vendor’s own cloud, refilled monthly or capped at storage. Pinecone and Qdrant Cloud both offer this. Free as in they do the work, within limits.
- Trial credits: a one-time wallet of paid-rate credit ($50–$300). Weaviate Cloud, Zilliz, and some others use this model. Useful for evaluation, not for shipping.
This guide focuses on the first two, because they are the only paths that let a real project keep running for free past the first month.
Qdrant: Open-Source Rust + Generous Managed Free Tier
Qdrant is a Rust-written vector database under the Apache 2.0 license. It is the rare project that gives you a credible production-grade open-source binary and a generous managed cloud free tier from the same team — which means you can prototype on the free cloud, migrate the exact same data to a self-hosted instance later, and never touch a different query language.
Free cloud cluster (no card)
The Qdrant Cloud free tier gives you one 1 GB cluster, free forever, with no credit card required. That is not a trial. It does not auto-convert to paid. The cluster is region-pinned, has full TLS, and exposes both REST and gRPC. You get:
- 1 GB RAM cluster (enough for roughly 1–3 million 384-dimensional vectors with default HNSW parameters)
- Full HNSW indexing with all distance metrics (cosine, dot, Euclidean, Manhattan)
- Payload filtering (Qdrant’s headline feature — filter by metadata during the ANN search, not after)
- Hybrid search (dense + sparse vectors in the same query) since Qdrant 1.10
- Snapshots, backups, monitoring dashboard
Self-hosting
One Docker command and you have a running Qdrant on your laptop or a VPS:
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
That is the complete install. There is no separate metadata store, no Zookeeper, no Kafka. The binary is ~20 MB, the disk format is portable, and Qdrant ships an official REST + gRPC schema plus first-party clients for Python, JavaScript/TypeScript, Go, Rust, Java, and .NET.
Python in 10 lines
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(url="https://YOUR-CLUSTER.qdrant.io", api_key="...")
client.create_collection(
"docs",
vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)
client.upsert("docs", points=[
PointStruct(id=1, vector=[...1024 floats...], payload={"title": "Hello"}),
])
hits = client.search("docs", query_vector=[...1024 floats...], limit=5)
What pushes you off the free tier
Storage. One gigabyte is enough for a personal knowledge base, an internal company FAQ, or a side project’s documentation — but a SaaS that ingests user content will hit the ceiling fast. The next step is the Free Trial credit (currently $25) on a larger cluster, then paid tiers that start around $0.014/hour for a 4 GB cluster. Or you migrate to self-host.
Pinecone: The Managed-First Default
Pinecone was the first venture-funded managed vector database and remains the easiest one to get a production-shaped URL out of. The product is closed-source — you cannot run a Pinecone binary on your own hardware — but the trade-off is that you cannot break anything either. There is no cluster to size, no HNSW parameters to tune, no replicas to provision.
Starter plan free tier
The Pinecone Starter plan gives every account a permanent free allowance:
- 2 GB storage
- 5 serverless indexes
- 2 million read units per month
- 1 million write units per month
- Up to 100 namespaces per index
- No credit card required
The free tier is serverless — there are no nodes to pay for when idle. You pay (or use free units) per read and per write, where a read unit roughly equals a single small query and a write unit roughly equals one vector upserted. For a typical chatbot, 2 million read units is on the order of hundreds of thousands of user queries a month, which is more than enough for any prototype and many small production apps.
Python in 10 lines
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="...")
pc.create_index(
name="docs",
dimension=1024,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("docs")
index.upsert(vectors=[("doc-1", [...1024 floats...], {"title": "Hello"})])
hits = index.query(vector=[...1024 floats...], top_k=5, include_metadata=True)
What pushes you off the free tier
The first wall is usually concurrent users, not storage. A B2C app that does any meaningful traffic will burn through 2 million read units quickly, and once you exceed the monthly allowance the index is paused (Starter plan) or you pay overage (Standard plan starts at $50/month minimum). The second wall is features: namespaces above 100, hybrid search beyond serverless’s current support window, and on-prem deployment all push you to Enterprise.
Chroma: The Local-First Default
Chroma is the lightest possible vector database. It is also Apache 2.0, but its philosophy is the opposite of Pinecone’s: it expects to live inside your Python application as an embedded library, the way SQLite lives inside your application as a file. There is a server mode, but the default getting-started path is pip install chromadb and you have a working vector database in the same process as your script.
Free path
The local install is the free tier. There is no signup, no cluster, no API key — just a directory on disk where Chroma persists its DuckDB-backed storage. Chroma Cloud is in paid private preview as of late 2025, so for free-tier purposes Chroma is a pure self-host story.
Python in 5 lines
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("docs")
collection.add(ids=["doc-1"], documents=["Hello world"], metadatas=[{"src": "readme"}])
hits = collection.query(query_texts=["What is hello?"], n_results=5)
Note the API difference: Chroma can embed text for you using a default sentence-transformer (downloads on first use), so you can pass query_texts instead of pre-computed vectors. That is brilliant for prototypes and a footgun in production — the bundled embedder is small, English-only, and not what you want for a real product. For anything serious, plug in OpenAI, Cohere Embed v3, or a custom embedding function.
What pushes you off Chroma
Concurrency, scale, and operations. Chroma’s in-process mode is single-writer. Its server mode (chroma run) exists and works, but the operational story — backups, replication, monitoring, multi-region — is far less mature than Qdrant’s. Chroma is the best default for “I want a working RAG demo in five minutes” and “I want a local notebook to find similar items in my CSV.” It becomes a liability the moment you have ten concurrent users hitting the same index from a deployed web app.
Head-to-Head: Free Tier Limits Compared
| Limit | Qdrant Cloud Free | Pinecone Starter | Chroma (Local) |
|---|---|---|---|
| Storage | 1 GB RAM (~1–3M vectors at 384d) | 2 GB | Your disk |
| Indexes / collections | Multiple in 1 cluster | 5 indexes | Unlimited (your file system) |
| Reads per month | No hard cap (RAM-bound) | 2 M read units | Unlimited (CPU-bound) |
| Writes per month | No hard cap | 1 M write units | Unlimited |
| Hybrid (dense + sparse) | Yes | Partial (sparse-dense indexes, region-limited) | No (dense only) |
| Metadata filtering during ANN | Yes (payload filter inside HNSW walk) | Yes | Yes (post-filter) |
| Persistence | Cloud-managed | Cloud-managed | Local DuckDB / SQLite |
| Backups | Snapshots | Collection backups | Copy the directory |
| Self-host option | Yes (Apache 2.0) | No | Yes (Apache 2.0) |
| Credit card to start | No | No | No (no account needed) |
Two things jump out. First, Chroma does not really compete on the same axis — it is a library, not a service. Second, between the two services, Qdrant’s free tier is the only one whose cap is storage only, not query volume. Pinecone will pause your index if you blow the read-unit budget. Qdrant Cloud will simply slow down if you saturate the 1 GB cluster, but the queries keep flowing.
Performance: What the Public Benchmarks Say
The vector-database performance picture changes every quarter, and most vendor benchmarks are theatre. Two public third-party datasets are worth looking at:
- The Qdrant vector-db-benchmark repo — open-source, reproducible, runs every major engine through the same ANN-Benchmarks dataset with default and tuned configurations. Yes, it is published by Qdrant, but the harness is open and you can re-run it. Qdrant generally tops latency and RPS in their published runs; Chroma is not in the comparison set because it is single-node.
- The ann-benchmarks.com leaderboard — the canonical academic benchmark for ANN libraries (not full databases), useful for comparing the underlying index algorithms (HNSW, IVF, ScaNN).
For a small free-tier project, the takeaway is that all three engines will return a top-5 query under 50 ms with healthy recall at the dataset sizes you can actually fit in their free quotas. Latency-per-dollar starts to matter at higher scale; at the free tier, pick on developer experience and lock-in, not p99 by 5 ms.
Embedding Compatibility
None of these databases generate embeddings on their own (Chroma’s default model aside). You bring vectors in, and the database stores and searches them. That means your embedding choice is independent — and worth thinking about, because the bill on embeddings can dwarf the bill on the vector DB itself.
| Embedder | Dimension | Free tier | Plays well with |
|---|---|---|---|
| Cohere Embed v3 | 1024 (or 384 light) | Trial key, no card | Multilingual RAG, +Rerank in one stack |
| OpenAI text-embedding-3-small | 1536 (or shrinkable) | Pay-as-you-go ($0.02/1M tokens) | Ubiquitous defaults, every library supports it |
| Voyage AI voyage-3-lite | 512 | $50 trial credit | Lowest latency, strong on code |
| BGE / E5 (open source) | varies | Free (self-host) | Air-gapped deployments, zero per-token cost |
| Sentence-Transformers (open source) | 384 / 768 | Free (self-host) | Local notebooks, Chroma’s default |
All three vector databases accept any of these; they are agnostic about where the vectors came from as long as the dimension matches what you declared at index creation.
When to Choose Which: Decision Tree
-
You want a notebook-based RAG demo today, with no signup.
→ Chroma.
pip install chromadb, three lines, done. Move on. - You are building a real product and want managed infrastructure with zero ops. → Pinecone. The starter plan covers prototypes, the upgrade path is clean, the docs are the best in the category. You pay the price of vendor lock-in.
- You want a real free tier you can leave running, with an exit door to self-host when traffic grows. → Qdrant. The 1 GB cloud cluster carries a small production app, and when you outgrow it the migration to a self-hosted Docker container is one snapshot restore away.
- You need hybrid search (BM25 + dense) without paying for a premium tier. → Qdrant. It is the only one of the three that ships full sparse-dense hybrid in its free tier.
- You need to filter by tens of metadata fields during retrieval. → Qdrant. Payload filtering happens inside the HNSW walk, not as a post-filter, which preserves recall when the filter is selective.
- You are deploying to a customer’s air-gapped environment. → Qdrant or Chroma. Pinecone is not an option here.
- Your team has zero appetite for running a database. → Pinecone. The serverless model is the closest thing to “vector DB as an HTTP function” in the market.
The Self-Host vs Managed Trade-Off
This is the question that decides 80% of the choice between Qdrant/Chroma and Pinecone. Self-hosting is free in money and expensive in attention. A small VPS — Oracle Cloud’s always-free ARM tier gives you four cores and 24 GB of RAM for $0 forever — can comfortably run Qdrant or Chroma serving a small RAG app, and the marginal cost of growth is just whatever extra RAM you buy.
What self-hosting does not give you for free is:
- Automatic snapshot-and-restore on a schedule you trust
- Multi-region replication for HA
- An on-call rotation when the disk fills up at 3 a.m.
- A vendor support contract when something subtle breaks
For a hobby app or an MVP, those things do not matter — the cost of an outage is your own time. For anything with revenue attached, the managed option starts to look cheap. Qdrant’s strength is that the same query interface works on both, so the migration story is straightforward when the project’s stakes change.
Integration with LangChain, LlamaIndex, and the LLM Layer
All three databases have first-class connectors in the major orchestration libraries — there is no reason to pick on integration coverage:
-
LangChain:
langchain-qdrant,langchain-pinecone,langchain-chromaare all official packages with active maintenance. -
LlamaIndex: Same story —
QdrantVectorStore,PineconeVectorStore,ChromaVectorStoreall live in the core repo or first-party plugins. - Haystack, LlamaCpp, Semantic Kernel: All three databases are first-tier choices.
On the LLM side, the vector database is independent of the model you use to generate answers. Free-tier RAG stacks I see most often in 2026:
- Embeddings: Cohere Embed v3 (free trial key)
- Reranker: Cohere Rerank v3 (same key)
- Vector store: Qdrant Cloud free or local Chroma
- LLM: Groq Llama 3.3, Gemini 2.5 Flash, or Together AI’s free model tier
That entire pipeline costs $0 up to the point where any single component’s free quota runs out, which for most personal projects is essentially never.
FAQ
Is pgvector a better choice than these three?
If you already run PostgreSQL and your collection fits in a single Postgres box, pgvector is a serious option — one fewer service to operate, transactional consistency with your other tables, mature backups. It loses to Qdrant on filtering performance at scale and on hybrid search, and it tops out earlier on throughput. For a RAG project where Postgres is already in the stack, start there. For a new project, the specialised databases are easier to reason about.
What about Weaviate, Milvus, Zilliz, Vespa?
All worth knowing. Weaviate has the most ambitious built-in module system (it ships its own embedders, rerankers, multi-tenancy, generative search), but the managed free tier is a 14-day trial, not permanent. Milvus is the heavyweight open-source choice for hundred-million-vector deployments; overkill for a starting project. Zilliz is the managed Milvus, with a serverless free tier that competes with Pinecone. Vespa is Yahoo’s open-source search engine that also does vectors well, and is the right pick if you need full text + vectors + structured filters at search-engine scale. For free-tier RAG, the three covered here are the most popular for a reason — they have the lowest activation energy.
Can I use these databases without an LLM at all?
Yes — vector databases are useful any time you have items and want similarity search. Recommendation systems, semantic search across product catalogues, duplicate detection, image similarity (with image embeddings), code search. RAG is the headline use case but not the only one.
How big does my vector index have to be before I need a real database?
Rule of thumb: under 100 K vectors, a flat numpy array with cosine similarity is faster than any database and zero ops. From 100 K to a few million, an in-process library like Chroma or FAISS is fine. Past 10 M vectors, you want a real database with persistence, snapshots, and a binary protocol — Qdrant, Pinecone, or Weaviate. The crossover is fuzzy; the gradient is real.
Do I need to re-embed everything when I change my embedding model?
Yes. Embeddings from different models are not interoperable — a query vector from OpenAI cannot be searched against documents embedded with Cohere. This is the single biggest hidden cost of RAG. When you change embedding models, you re-embed your entire corpus, which is also a re-write of every vector in the database. Plan migrations.
What is a “write unit” or “read unit” in Pinecone’s pricing?
Pinecone’s serverless billing splits operations into read units and write units, where one read unit roughly equals one similarity query that returns up to 10 results from a small index, and one write unit roughly equals one vector upserted. The actual conversion depends on index size and result count — the Pinecone docs have the exact formula. For most chatbot workloads, 2 M read units a month covers far more queries than you would expect.
Related Reads
- Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026 — the embedding/reranker layer that pairs naturally with any of the three databases above.
- Together AI Free API: Run Llama 3.3, DeepSeek R1, and FLUX — the LLM side of a complete free RAG stack.
- 10 Best Free AI APIs in 2026 — broader survey of the free AI API landscape.
- Oracle Cloud Always Free: 4-Core 24GB ARM VPS — where to host a self-managed Qdrant or Chroma for free.
- Vercel vs Netlify vs Cloudflare Pages — frontend hosting to deploy the RAG app on top.
Originally published at toolfreebie.com.

Top comments (0)