Vector Databases Comparison: Pinecone vs Chroma vs Weaviate (2026)

#ai #webdev #python #machinelearning

Choosing a vector database is one of the first decisions in any RAG project. Three options dominate developer usage: Pinecone, Chroma, and Weaviate. Here's a practical comparison.

Quick Overview

	Pinecone	Chroma	Weaviate
Hosting	Managed cloud only	Local or cloud	Self-hosted or cloud
Setup time	5 minutes	1 minute	10–20 minutes
Free tier	Yes (1 index)	Yes (local)	Yes (self-hosted)
Best for	Production scale	Development, prototyping	Hybrid search, complex queries

Chroma: Start Here

Chroma runs in-process — no server needed. Perfect for development and small projects.

pip install chromadb

import chromadb
from chromadb.utils import embedding_functions

client = chromadb.PersistentClient("./chroma_db")

ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

collection = client.get_or_create_collection("docs", embedding_function=ef)

collection.add(
    documents=[
        "RAG stands for Retrieval-Augmented Generation.",
        "Fine-tuning updates model weights on new data.",
        "Vector search finds semantically similar content.",
    ],
    ids=["doc1", "doc2", "doc3"],
)

results = collection.query(query_texts=["how does RAG work?"], n_results=2)
print(results["documents"])

Pros: Zero setup, runs locally, great for prototyping.
Cons: Not designed for multi-node scale, no built-in auth.

Pinecone: Managed Scale

Pinecone is a fully managed vector database. No infrastructure to run.

pip install pinecone-client sentence-transformers

from pinecone import Pinecone, ServerlessSpec
from sentence_transformers import SentenceTransformer

pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")

pc.create_index(
    name="kalyna-docs",
    dimension=384,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

index = pc.Index("kalyna-docs")
model = SentenceTransformer("all-MiniLM-L6-v2")

texts = ["RAG is a retrieval technique.", "Fine-tuning changes model weights."]
embeddings = model.encode(texts).tolist()

index.upsert(vectors=[
    {"id": "v1", "values": embeddings[0], "metadata": {"text": texts[0]}},
    {"id": "v2", "values": embeddings[1], "metadata": {"text": texts[1]}},
])

query_vec = model.encode(["how to retrieve documents?"]).tolist()[0]
results = index.query(vector=query_vec, top_k=2, include_metadata=True)
for match in results.matches:
    print(match.metadata["text"], "| score:", round(match.score, 3))

Pros: Zero ops, scales automatically, fast at large scale.
Cons: Vendor lock-in, can get expensive, free tier limits.

Weaviate: Hybrid Search

Weaviate supports both vector search and keyword (BM25) search — called hybrid search.

pip install weaviate-client
docker run -p 8080:8080 cr.weaviate.io/semitechnologies/weaviate:latest

import weaviate
from weaviate.classes.config import Configure, Property, DataType

client = weaviate.connect_to_local()

client.collections.create(
    "Document",
    vectorizer_config=Configure.Vectorizer.text2vec_transformers(),
    properties=[
        Property(name="content", data_type=DataType.TEXT),
        Property(name="source", data_type=DataType.TEXT),
    ],
)

collection = client.collections.get("Document")
collection.data.insert({"content": "RAG retrieves documents at inference time.", "source": "guide"})

results = collection.query.hybrid(
    query="retrieval augmented generation",
    alpha=0.5,  # 0 = pure keyword, 1 = pure vector
    limit=2,
)
for obj in results.objects:
    print(obj.properties["content"])

client.close()

Pros: Hybrid search, strong filtering, active community.
Cons: More complex setup, steeper learning curve.

Which One to Pick?

Chroma — prototyping, demos, local development. Working in 10 minutes.

Pinecone — production, you don't want to manage infrastructure. Pay for convenience.

Weaviate — hybrid search (semantic + keyword), complex filters, multi-tenant systems.

With Claude (RAG Example)

import anthropic

def rag_answer(question: str) -> str:
    results = collection.query(query_texts=[question], n_results=3)
    context = "\n".join(results["documents"][0])

    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=512,
        messages=[{
            "role": "user",
            "content": f"Answer based on this context only:\n{context}\n\nQuestion: {question}"
        }]
    )
    return response.content[0].text