Chroma is one of the most widely used open-source vector databases in the AI ecosystem. It's the backbone of countless RAG pipelines, semantic search systems, and AI agent memory layers — including projects I've built myself. So when I found issue #3111 on their GitHub — a request to update their Haystack integration documentation — I decided to take it on.
The issue was straightforward: the existing docs were outdated, lacked clear installation steps, and had almost no practical examples for ChromaDocumentStore. But when I looked closer, I found two already-open PRs. That's where it got interesting.
The Existing PRs
Before writing anything, I checked what others had already done. There were two open PRs for this exact issue:
PR #3112: Deleted the entire documentation file. The contributor decided the best way to "update" the docs was to remove them entirely. This doesn't help users — it just creates a dead link.
PR #7229 (manikanta7cheruku): Fixed a Path bug in the code example and swapped the LLM from HuggingFace (Mixtral) to OpenAI (GPT-3.5-turbo). The Path fix was good. The LLM swap was questionable — it replaced a free, self-hosted model with one that requires a paid API key, which is a worse default for open-source documentation.
Neither PR actually addressed the core request: comprehensive, clear examples for ChromaDocumentStore. So I wrote my own.
What I Changed
I rewrote docs/mintlify/integrations/frameworks/haystack.mdx from 80 lines to 187 lines. Here's what was added:
Clearer Installation
The original docs only had pip install chroma-haystack. I added haystack-ai as an explicit dependency since the integration package doesn't always pull it in correctly.
Quick Start Example
A minimal 10-line example showing Document creation, writing, and counting — the fastest way to verify your setup works:
from haystack import Document
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
document_store = ChromaDocumentStore()
docs = [
Document(content="Chroma stores documents as vectors."),
Document(content="Haystack provides the orchestration layer."),
]
document_store.write_documents(docs)
print(document_store.count_documents()) # Output: 2
All Three Deployment Modes
The original docs only showed in-memory mode. I documented all three: in-memory (development), persistent storage (disk-backed), and remote client-server (production with Docker or chroma run).
Fixed the Path Bug
The original code had "data" / Path(name) — a string divided by a Path object, which throws a TypeError. Fixed to Path("data") / name.
Retrieval Examples
Added three retrieval patterns: text query (auto-embedded by Chroma), embedding query (pre-computed vectors), and metadata filtering using Haystack's filter syntax.
Full RAG Pipeline
A complete end-to-end RAG example connecting ChromaDocumentStore → ChromaQueryRetriever → PromptBuilder → HuggingFaceTGIGenerator. I kept HuggingFace (free, self-hosted) instead of swapping to OpenAI.
Troubleshooting Section
Common issues: import errors, empty results, remote connection refused, HuggingFace token errors. Each with a one-line fix.
What I Learned
1. Check Existing PRs First
Two people had already worked on this issue before me. One deleted the file. One did minimal fixes. If I hadn't checked, I might have duplicated effort or missed context. Always review what's already in flight.
2. Import Paths Change
The old docs used from chroma_haystack import ChromaDocumentStore — that's the archived package. The current one is from haystack_integrations.document_stores.chroma import ChromaDocumentStore. Same class, different package. This is exactly the kind of thing that makes docs go stale.
3. You Can't Push Directly to Upstream
I tried git push origin my-branch and got a 403. You have to fork the repo first, push to your fork, then create the PR with gh pr create --repo chroma-core/chroma --head SaintChris:my-branch. Simple, but not obvious if you haven't contributed to a large project before.
4. Docs Are Code
A broken code example in documentation is the same as a broken function — it wastes hours of developer time. The Path bug in the original docs would have caused a TypeError for anyone who copied the example. Fixing that one line probably saves more developer hours than most code contributions.
The PR
PR #7230 is now open on the chroma-core/chroma repository. It's documentation-only — no code changes. The branch is at SaintChris/chroma.
Whether it gets merged or not, the process itself was the value. I now understand how Chroma's deployment modes work at a deeper level, I've read their documentation pipeline, and I have a contribution on one of the most widely-used vector databases in the AI ecosystem.
Why This Matters
Open-source documentation is infrastructure. When it's wrong, every developer who copies that example hits a wall. When it's incomplete, they spend hours reading source code to figure out what the docs should have said.
Contributing to docs isn't glamorous. But for a project like Chroma — which thousands of developers depend on for RAG pipelines, agent memory, and semantic search — a clear, comprehensive integration guide is worth more than most feature PRs.
Top comments (0)