Why Chroma?
Chroma is the simplest vector database for AI. It runs embedded in your Python or JavaScript app — no server setup, no Docker, no infrastructure. Just pip install and go.
Free and open source. Chroma Cloud coming soon with managed hosting.
Getting Started
Python (Embedded — Zero Setup)
pip install chromadb
import chromadb
client = chromadb.Client() # In-memory
# Or persistent:
# client = chromadb.PersistentClient(path="./chroma_data")
# Create collection (auto-embeds with default model!)
collection = client.create_collection(name="articles")
# Add documents — Chroma embeds them automatically!
collection.add(
documents=[
"Machine learning is transforming how we build software",
"React Server Components change the way we think about rendering",
"Docker containers simplify deployment and scaling",
"GraphQL provides a flexible alternative to REST APIs",
"Rust's ownership model prevents memory bugs at compile time"
],
ids=["ml-1", "react-1", "docker-1", "graphql-1", "rust-1"],
metadatas=[
{"category": "AI", "author": "Alice"},
{"category": "Frontend", "author": "Bob"},
{"category": "DevOps", "author": "Charlie"},
{"category": "API", "author": "Alice"},
{"category": "Systems", "author": "Diana"}
]
)
# Semantic query — finds by meaning!
results = collection.query(
query_texts=["artificial intelligence and neural networks"],
n_results=3
)
for doc, meta, dist in zip(results['documents'][0], results['metadatas'][0], results['distances'][0]):
print(f"[{meta['category']}] {doc[:60]}... (distance: {dist:.3f})")
# Filtered query
results = collection.query(
query_texts=["modern web development"],
where={"category": "Frontend"},
n_results=3
)
# Update
collection.update(
ids=["ml-1"],
documents=["Deep learning and neural networks are revolutionizing AI"],
metadatas=[{"category": "AI", "author": "Alice", "updated": True}]
)
JavaScript (Also Embedded!)
import { ChromaClient } from "chromadb";
const client = new ChromaClient();
const collection = await client.createCollection({ name: "docs" });
// Add documents
await collection.add({
ids: ["doc1", "doc2", "doc3"],
documents: [
"How to deploy a Next.js app to Vercel",
"Building REST APIs with Express.js",
"Introduction to TypeScript generics"
],
metadatas: [
{ topic: "deployment" },
{ topic: "backend" },
{ topic: "typescript" }
]
});
// Query
const results = await collection.query({
queryTexts: ["hosting web applications"],
nResults: 2
});
console.log(results.documents);
RAG (Retrieval Augmented Generation)
import chromadb
import openai
# 1. Store your knowledge base in Chroma
client = chromadb.PersistentClient(path="./knowledge")
kb = client.get_or_create_collection("knowledge_base")
# Add your docs
kb.add(
documents=["Your company docs...", "Product specs...", "FAQ..."],
ids=["doc1", "doc2", "doc3"]
)
# 2. Query Chroma for relevant context
def ask(question):
results = kb.query(query_texts=[question], n_results=3)
context = "\n".join(results['documents'][0])
# 3. Feed context to LLM
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
print(ask("What are the product specifications?"))
Chroma vs Alternatives
| Feature | Chroma | Pinecone | Qdrant |
|---|---|---|---|
| Embedded mode | Yes | No | No |
| Setup required | None | Account | Docker |
| Auto-embedding | Yes | No | No |
| License | Apache 2.0 | Proprietary | Apache 2.0 |
| Best for | Prototyping + RAG | Production | Production |
Need to scrape data for your AI app? I build production-ready scrapers. Check out my Apify actors or email spinov001@gmail.com for custom data pipelines.
Building RAG apps? What's your vector DB choice? Share below!
Top comments (0)