ChromaDB is the simplest open-source vector database — designed to make AI applications easy to build. It takes 4 lines of code to create a collection, add documents, and query.
Free, open source, embeddable in Python. The SQLite of vector databases.
Why Use ChromaDB?
- Dead simple — 4 lines to get started
- Embeddable — runs in your Python process, no separate server needed
- Auto-embeddings — built-in embedding functions (Sentence Transformers, OpenAI, etc.)
- Client-server mode — scale up with Chroma Server when needed
- Metadata filtering — combine vector search with metadata filters
Quick Setup
1. Install
pip install chromadb
# Or run as server
chroma run --host 0.0.0.0 --port 8000
2. In-Memory (4 Lines!)
import chromadb
client = chromadb.Client()
collection = client.create_collection("articles")
collection.add(documents=["Web scraping extracts data from websites", "APIs provide structured data access"], ids=["doc1", "doc2"])
results = collection.query(query_texts=["how to get data from websites"], n_results=2)
print(results["documents"])
3. Persistent Storage
import chromadb
client = chromadb.PersistentClient(path="./chroma_data")
collection = client.get_or_create_collection("knowledge")
# Add documents with metadata
collection.add(
documents=[
"BeautifulSoup parses HTML for data extraction",
"Scrapy is a full-featured web crawling framework",
"Playwright automates browsers for dynamic content"
],
metadatas=[
{"category": "parsing", "difficulty": "beginner"},
{"category": "framework", "difficulty": "intermediate"},
{"category": "browser", "difficulty": "intermediate"}
],
ids=["bs4", "scrapy", "playwright"]
)
# Query with metadata filter
results = collection.query(
query_texts=["extract data from JavaScript-heavy websites"],
n_results=3,
where={"difficulty": "intermediate"}
)
for doc, meta, dist in zip(results["documents"][0], results["metadatas"][0], results["distances"][0]):
print(f"{doc[:50]}... | Category: {meta['category']} | Distance: {dist:.4f}")
4. Client-Server Mode
# Start server
chroma run --port 8000
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.get_or_create_collection("articles")
5. REST API
CHROMA="http://localhost:8000"
# List collections
curl -s "$CHROMA/api/v1/collections" | jq '.[].name'
# Create collection
curl -s -X POST "$CHROMA/api/v1/collections" \
-H "Content-Type: application/json" \
-d '{"name": "articles"}' | jq
# Add documents
curl -s -X POST "$CHROMA/api/v1/collections/{collection_id}/add" \
-H "Content-Type: application/json" \
-d '{
"ids": ["doc1", "doc2"],
"documents": ["Web scraping guide", "API tutorial"],
"metadatas": [{"type": "tutorial"}, {"type": "guide"}]
}'
# Query
curl -s -X POST "$CHROMA/api/v1/collections/{collection_id}/query" \
-H "Content-Type: application/json" \
-d '{
"query_texts": ["data extraction"],
"n_results": 5
}' | jq
RAG with ChromaDB + OpenAI
import chromadb
from openai import OpenAI
chroma = chromadb.PersistentClient()
openai = OpenAI()
collection = chroma.get_or_create_collection("knowledge")
# Add your docs
collection.add(
documents=["Your domain knowledge here..."],
ids=["doc1"]
)
# RAG query
query = "How do I scrape dynamic websites?"
results = collection.query(query_texts=[query], n_results=3)
context = "\n".join(results["documents"][0])
answer = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": f"Answer based on: {context}"},
{"role": "user", "content": query}
]
)
print(answer.choices[0].message.content)
Key REST Endpoints
| Endpoint | Description |
|---|---|
| /api/v1/collections | List/create collections |
| /api/v1/collections/{id}/add | Add documents |
| /api/v1/collections/{id}/query | Vector search |
| /api/v1/collections/{id}/get | Get documents |
| /api/v1/collections/{id}/update | Update documents |
| /api/v1/collections/{id}/delete | Delete documents |
| /api/v1/heartbeat | Health check |
Need custom data extraction or scraping solution? I build production-grade scrapers for any website. Email: Spinov001@gmail.com | My Apify Actors
Top comments (0)