DEV Community

Cover image for ChromaDB Tutorial: Build Your First Vector Database with Python (2026)
Serhii Kalyna
Serhii Kalyna

Posted on • Originally published at kalyna.pro

ChromaDB Tutorial: Build Your First Vector Database with Python (2026)

ChromaDB is an open-source vector database designed for AI applications. It stores embeddings locally, runs without a server, and integrates with any Python project in minutes.

Originally published at kalyna.pro

What Is ChromaDB?

A vector database stores data as high-dimensional vectors (embeddings) and lets you search by semantic similarity — not exact keyword match.

ChromaDB stands out because:

  • No server needed — runs in-process or as a local server
  • Persistent storage — data survives restarts
  • Built-in embeddings — uses sentence-transformers out of the box
  • Simple API — add, query, update, delete in a few lines

Installation

pip install chromadb
pip install sentence-transformers  # for default embeddings
Enter fullscreen mode Exit fullscreen mode

Your First Collection

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_docs")

collection.add(
    documents=[
        "Python is a high-level programming language.",
        "ChromaDB is an open-source vector database.",
        "Machine learning models require large datasets.",
        "Docker containers package apps with their dependencies.",
        "Claude is an AI assistant built by Anthropic.",
    ],
    ids=["doc1", "doc2", "doc3", "doc4", "doc5"],
)

results = collection.query(
    query_texts=["What is a vector database?"],
    n_results=2,
)

for doc, dist in zip(results["documents"][0], results["distances"][0]):
    print(f"[{dist:.3f}] {doc}")
Enter fullscreen mode Exit fullscreen mode

Output:

[0.312] ChromaDB is an open-source vector database.
[0.578] Python is a high-level programming language.
Enter fullscreen mode Exit fullscreen mode

Persistent Storage

client = chromadb.PersistentClient(path="./chroma_db")

collection = client.get_or_create_collection("my_docs")
collection.add(
    documents=["This data will survive a restart."],
    ids=["persistent_doc1"],
)
Enter fullscreen mode Exit fullscreen mode

Embeddings: Default vs Custom

Sentence Transformers (local, free)

from chromadb.utils import embedding_functions

embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)
collection = client.get_or_create_collection("docs", embedding_function=embed_fn)
Enter fullscreen mode Exit fullscreen mode

OpenAI Embeddings

embed_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small",
)
Enter fullscreen mode Exit fullscreen mode

Use the same embedding function every time you open the same collection.

Metadata and Filtering

collection.add(
    documents=[
        "GPT-4 is a large language model by OpenAI.",
        "Claude Sonnet is fast and affordable.",
        "Llama 3 is an open-source model by Meta.",
    ],
    ids=["gpt4", "claude", "llama3"],
    metadatas=[
        {"company": "OpenAI",    "open_source": False},
        {"company": "Anthropic", "open_source": False},
        {"company": "Meta",      "open_source": True},
    ],
)

# Only open-source models
results = collection.query(
    query_texts=["free models"],
    n_results=2,
    where={"open_source": True},
)
Enter fullscreen mode Exit fullscreen mode

Filter Operators

where={"company": "Anthropic"}                              # exact match
where={"company": {"$ne": "OpenAI"}}                        # not equal
where={"company": {"$in": ["Anthropic", "Meta"]}}           # in list
where={"$and": [{"open_source": True}, {"company": {"$ne": "Meta"}}]}  # combine
Enter fullscreen mode Exit fullscreen mode

Getting, Updating, and Deleting

# Get by ID
result = collection.get(ids=["doc1", "doc2"])

# Update
collection.update(
    ids=["doc1"],
    documents=["Updated text goes here."],
    metadatas=[{"updated": True}],
)

# Delete by ID or filter
collection.delete(ids=["doc4"])
collection.delete(where={"company": "OpenAI"})

print(f"Documents: {collection.count()}")
Enter fullscreen mode Exit fullscreen mode

Integration with Claude

import anthropic
import chromadb
from chromadb.utils import embedding_functions

client = chromadb.PersistentClient(path="./chroma_db")
embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
col = client.get_collection("knowledge_base", embedding_function=embed_fn)
claude = anthropic.Anthropic()


def answer(question: str, n_results: int = 4) -> str:
    chunks = col.query(query_texts=[question], n_results=n_results)["documents"][0]
    context = "\n\n".join(chunks)

    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="Answer using ONLY the context. If not found, say so.",
        messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}],
    )
    return response.content[0].text
Enter fullscreen mode Exit fullscreen mode

Running ChromaDB as a Server

chroma run --path ./chroma_db --port 8000
Enter fullscreen mode Exit fullscreen mode
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)
Enter fullscreen mode Exit fullscreen mode

Summary

ChromaDB gives you a full vector database in a single pip install:

  • Create collections, add documents, query by semantic similarity
  • Persist data to disk with PersistentClient
  • Filter results with metadata using where
  • Plug in any embedding model — local or API-based
  • Combine with Claude to build RAG applications

Next: RAG Tutorial with Python or Build a RAG Chatbot.

Top comments (0)