ChromaDB is the simplest vector database — pip install, 4 lines of code, and you have a working semantic search. It runs embedded (in-process) or as a server, with a clean Python and JavaScript API.
Why ChromaDB?
- 4 lines to start — simplest API of any vector database
- Embedded mode — runs in-process, no server needed
- Auto-embedding — built-in OpenAI, Sentence Transformers, etc.
- Multi-modal — text, images, documents
- Free — open source, Apache 2.0
Install
pip install chromadb
Minimal Example (4 Lines!)
import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")
collection.add(
documents=["AI is transforming search", "Vector databases store embeddings", "RAG combines retrieval and generation"],
ids=["doc1", "doc2", "doc3"],
)
results = collection.query(query_texts=["how does AI search work?"], n_results=2)
print(results['documents']) # Most relevant documents
With Metadata Filtering
import chromadb
client = chromadb.PersistentClient(path="./chroma_data")
collection = client.get_or_create_collection("articles")
# Add with metadata
collection.add(
documents=[
"Introduction to machine learning algorithms",
"Best practices for REST API design",
"How transformers revolutionized NLP",
"GraphQL vs REST: A comparison",
],
metadatas=[
{"category": "AI", "year": 2024},
{"category": "Backend", "year": 2024},
{"category": "AI", "year": 2023},
{"category": "Backend", "year": 2023},
],
ids=["a1", "a2", "a3", "a4"],
)
# Semantic search with metadata filter
results = collection.query(
query_texts=["neural networks"],
n_results=3,
where={"category": "AI"},
)
# Combined filter
results = collection.query(
query_texts=["API design"],
n_results=5,
where={"$and": [{"category": "Backend"}, {"year": {"$gte": 2024}}]},
)
Server Mode + REST API
# Start server
chroma run --host 0.0.0.0 --port 8000
# REST API
curl http://localhost:8000/api/v1/collections
curl -X POST http://localhost:8000/api/v1/collections \
-H 'Content-Type: application/json' \
-d '{"name": "my_collection"}'
# Connect from Python
client = chromadb.HttpClient(host="localhost", port=8000)
JavaScript
import { ChromaClient } from 'chromadb';
const client = new ChromaClient();
const collection = await client.getOrCreateCollection({ name: 'docs' });
await collection.add({
documents: ['AI is amazing', 'Vectors power search'],
ids: ['id1', 'id2'],
});
const results = await collection.query({
queryTexts: ['artificial intelligence'],
nResults: 2,
});
console.log(results.documents);
RAG with ChromaDB + OpenAI
import chromadb
from openai import OpenAI
chroma = chromadb.PersistentClient(path="./rag_data")
collection = chroma.get_or_create_collection("knowledge")
openai = OpenAI()
def add_knowledge(texts: list[str], ids: list[str]):
collection.add(documents=texts, ids=ids)
def ask(question: str) -> str:
# Retrieve
results = collection.query(query_texts=[question], n_results=3)
context = "\n".join(results['documents'][0])
# Generate
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer using this context:\n{context}"},
{"role": "user", "content": question},
],
)
return response.choices[0].message.content
Key Features
| Feature | Details |
|---|---|
| Modes | Embedded, client-server, cloud |
| Embedding | Auto (OpenAI, ST, Cohere) or BYO |
| Filtering | Metadata + document |
| Persistence | In-memory or disk |
| Languages | Python, JavaScript, Ruby, Go |
| Multi-modal | Text, images |
Resources
Building AI apps? Check my Apify actors or email spinov001@gmail.com.
Top comments (0)