Chroma is an open-source vector database designed for AI applications. It makes building RAG (Retrieval-Augmented Generation) pipelines as simple as adding documents and querying.
Why Chroma Matters
RAG needs a vector store. Most options are complex (Pinecone, Weaviate) or limited (FAISS). Chroma is batteries-included: embed, store, search in 4 lines of code.
What you get for free:
- 4-line setup: create collection, add, query
- Built-in embedding (no separate embedding service needed)
- Document and metadata storage alongside vectors
- Python and JavaScript clients
- Runs embedded or as a server
- Where filters for metadata
- Multi-modal support
Quick Start
pip install chromadb
import chromadb
client = chromadb.Client()
# Create collection (auto-embeds with sentence-transformers)
collection = client.create_collection("docs")
# Add documents — Chroma embeds them automatically
collection.add(
documents=[
"AI is transforming healthcare with diagnostic tools",
"Machine learning models predict stock market trends",
"Neural networks process images for self-driving cars",
"NLP enables chatbots to understand human language",
],
metadatas=[
{"topic": "healthcare"},
{"topic": "finance"},
{"topic": "automotive"},
{"topic": "nlp"},
],
ids=["doc1", "doc2", "doc3", "doc4"],
)
# Query — returns most similar documents
results = collection.query(
query_texts=["How is AI used in medicine?"],
n_results=2,
)
print(results["documents"]) # Healthcare doc comes first
RAG with LangChain
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.chains import RetrievalQA
# Create vector store
vectorstore = Chroma.from_documents(
documents=docs,
embedding=OpenAIEmbeddings(),
persist_directory="./chroma_db",
)
# Create RAG chain
qa = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o"),
retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),
)
answer = qa.invoke("What are the latest AI trends?")
print(answer)
Metadata Filtering
results = collection.query(
query_texts=["AI applications"],
n_results=5,
where={"topic": {"$in": ["healthcare", "finance"]}},
)
# Complex filters
results = collection.query(
query_texts=["machine learning"],
where={
"$and": [
{"topic": {"$eq": "tech"}},
{"year": {"$gte": 2024}},
]
},
)
Persistent Storage
# Persist to disk
client = chromadb.PersistentClient(path="./chroma_data")
collection = client.get_or_create_collection("docs")
# Or run as a server
# chroma run --path ./chroma_data
JavaScript Client
import { ChromaClient } from "chromadb";
const client = new ChromaClient();
const collection = await client.createCollection({ name: "docs" });
await collection.add({
ids: ["doc1", "doc2"],
documents: ["AI in healthcare", "ML in finance"],
metadatas: [{topic: "health"}, {topic: "finance"}],
});
const results = await collection.query({
queryTexts: ["medical AI"],
nResults: 2,
});
Links
Building RAG applications? Check out my developer tools on Apify or email spinov001@gmail.com for custom solutions.
Top comments (0)