DEV Community

Alex Spinov
Alex Spinov

Posted on

ChromaDB Has a Free API — The Simplest Vector Database for AI Developers

ChromaDB is the simplest open-source vector database — designed to make AI applications easy to build. It takes 4 lines of code to create a collection, add documents, and query.

Free, open source, embeddable in Python. The SQLite of vector databases.

Why Use ChromaDB?

  • Dead simple — 4 lines to get started
  • Embeddable — runs in your Python process, no separate server needed
  • Auto-embeddings — built-in embedding functions (Sentence Transformers, OpenAI, etc.)
  • Client-server mode — scale up with Chroma Server when needed
  • Metadata filtering — combine vector search with metadata filters

Quick Setup

1. Install

pip install chromadb

# Or run as server
chroma run --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode

2. In-Memory (4 Lines!)

import chromadb

client = chromadb.Client()
collection = client.create_collection("articles")
collection.add(documents=["Web scraping extracts data from websites", "APIs provide structured data access"], ids=["doc1", "doc2"])
results = collection.query(query_texts=["how to get data from websites"], n_results=2)
print(results["documents"])
Enter fullscreen mode Exit fullscreen mode

3. Persistent Storage

import chromadb

client = chromadb.PersistentClient(path="./chroma_data")
collection = client.get_or_create_collection("knowledge")

# Add documents with metadata
collection.add(
    documents=[
        "BeautifulSoup parses HTML for data extraction",
        "Scrapy is a full-featured web crawling framework",
        "Playwright automates browsers for dynamic content"
    ],
    metadatas=[
        {"category": "parsing", "difficulty": "beginner"},
        {"category": "framework", "difficulty": "intermediate"},
        {"category": "browser", "difficulty": "intermediate"}
    ],
    ids=["bs4", "scrapy", "playwright"]
)

# Query with metadata filter
results = collection.query(
    query_texts=["extract data from JavaScript-heavy websites"],
    n_results=3,
    where={"difficulty": "intermediate"}
)

for doc, meta, dist in zip(results["documents"][0], results["metadatas"][0], results["distances"][0]):
    print(f"{doc[:50]}... | Category: {meta['category']} | Distance: {dist:.4f}")
Enter fullscreen mode Exit fullscreen mode

4. Client-Server Mode

# Start server
chroma run --port 8000
Enter fullscreen mode Exit fullscreen mode
import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.get_or_create_collection("articles")
Enter fullscreen mode Exit fullscreen mode

5. REST API

CHROMA="http://localhost:8000"

# List collections
curl -s "$CHROMA/api/v1/collections" | jq '.[].name'

# Create collection
curl -s -X POST "$CHROMA/api/v1/collections" \
  -H "Content-Type: application/json" \
  -d '{"name": "articles"}' | jq

# Add documents
curl -s -X POST "$CHROMA/api/v1/collections/{collection_id}/add" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": ["doc1", "doc2"],
    "documents": ["Web scraping guide", "API tutorial"],
    "metadatas": [{"type": "tutorial"}, {"type": "guide"}]
  }'

# Query
curl -s -X POST "$CHROMA/api/v1/collections/{collection_id}/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query_texts": ["data extraction"],
    "n_results": 5
  }' | jq
Enter fullscreen mode Exit fullscreen mode

RAG with ChromaDB + OpenAI

import chromadb
from openai import OpenAI

chroma = chromadb.PersistentClient()
openai = OpenAI()
collection = chroma.get_or_create_collection("knowledge")

# Add your docs
collection.add(
    documents=["Your domain knowledge here..."],
    ids=["doc1"]
)

# RAG query
query = "How do I scrape dynamic websites?"
results = collection.query(query_texts=[query], n_results=3)
context = "\n".join(results["documents"][0])

answer = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": f"Answer based on: {context}"},
        {"role": "user", "content": query}
    ]
)
print(answer.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Key REST Endpoints

Endpoint Description
/api/v1/collections List/create collections
/api/v1/collections/{id}/add Add documents
/api/v1/collections/{id}/query Vector search
/api/v1/collections/{id}/get Get documents
/api/v1/collections/{id}/update Update documents
/api/v1/collections/{id}/delete Delete documents
/api/v1/heartbeat Health check

Need custom data extraction or scraping solution? I build production-grade scrapers for any website. Email: Spinov001@gmail.com | My Apify Actors

Top comments (0)