ChromaDB is an open-source embedding database — the simplest way to add AI-powered search, RAG, and recommendations to your Python or JavaScript app.
Why ChromaDB?
- Simplest API: 4 functions to learn
- Auto-embedding: Built-in embedding models
- Python + JS: Both SDKs available
- Self-hosted: No cloud dependency
- Metadata filtering: Combine semantic + keyword search
- Persistent storage: SQLite backend
Install
pip install chromadb
Basic Usage
import chromadb
client = chromadb.Client()
# Create collection (auto-embeds with default model)
collection = client.create_collection('articles')
# Add documents — ChromaDB embeds them automatically
collection.add(
documents=[
'Introduction to machine learning with Python',
'Building REST APIs with FastAPI',
'Docker containers for beginners',
'React hooks explained simply',
],
ids=['doc1', 'doc2', 'doc3', 'doc4'],
metadatas=[
{'category': 'AI', 'level': 'beginner'},
{'category': 'Backend', 'level': 'intermediate'},
{'category': 'DevOps', 'level': 'beginner'},
{'category': 'Frontend', 'level': 'intermediate'},
]
)
# Search
results = collection.query(
query_texts=['How do I build an API?'],
n_results=2
)
print(results['documents'])
# [['Building REST APIs with FastAPI', 'React hooks explained simply']]
Persistent Storage
client = chromadb.PersistentClient(path='./chroma-data')
Server Mode
# Start server
chroma run --host 0.0.0.0 --port 8000
# Connect from client
client = chromadb.HttpClient(host='localhost', port=8000)
Filtered Search
results = collection.query(
query_texts=['backend tutorial'],
n_results=5,
where={'category': 'Backend'},
where_document={'$contains': 'API'}
)
With Custom Embeddings (OpenAI)
from chromadb.utils import embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key='sk-...',
model_name='text-embedding-3-small'
)
collection = client.create_collection(
name='articles',
embedding_function=openai_ef
)
RAG in 10 Lines
import chromadb
import openai
db = chromadb.PersistentClient(path='./docs')
collection = db.get_collection('knowledge-base')
def ask(question: str) -> str:
docs = collection.query(query_texts=[question], n_results=3)
context = '\n'.join(docs['documents'][0])
return openai.chat.completions.create(
model='gpt-4',
messages=[{'role': 'user', 'content': f'Based on: {context}\n\nAnswer: {question}'}]
).choices[0].message.content
Update and Delete
# Update
collection.update(
ids=['doc1'],
documents=['Updated: Advanced ML with Python and scikit-learn'],
metadatas=[{'category': 'AI', 'level': 'advanced'}]
)
# Delete
collection.delete(ids=['doc4'])
# Delete by filter
collection.delete(where={'category': 'Frontend'})
Real-World Use Case
A support team embedded 5,000 help articles into ChromaDB. When customers typed questions, the system found relevant articles in <50ms — reducing ticket volume by 40% because customers found answers themselves.
Need to automate data collection? Check out my Apify actors for ready-made scrapers, or email spinov001@gmail.com for custom solutions.
Top comments (0)