ChromaDB 0.5 Silently Leaks Memory Until You Set One Env Var

#docker #chromadb

The TL;DR

If you run ChromaDB 0.5.x with more than a few hundred collections, set these two env vars before anything else:

CHROMA_SEGMENT_CACHE_POLICY=LRU
CHROMA_MEMORY_LIMIT_BYTES=10737418240   # 10 GiB

Without them, ChromaDB 0.5.x has an unresolved memory leak in the segment cache. Upstream issues #3336 and #5843 are still open. We discovered this the slow way.

HoneyChat at a glance (for context): Telegram-native AI companion bot, ~300 DAU, 17 languages. Stack: aiogram bot + FastAPI (uvicorn, 4 workers) + Celery workers (queues llm / images / gifs / voice) + Celery beat (RedBeat) + Next.js 15 web + Astro blog + React/Vite Mini App. Storage: PostgreSQL 16 + Redis + ChromaDB 0.5.x + Storj S3. Host: 32 GB / 16-core Xeon, single box.

The shape of the leak

We run 2,233 ChromaDB collections in production — one per (character_id, session_id) pair, so each conversation gets isolated semantic memory and scene context never bleeds between sessions. Mean collection size: 4.9 documents (small per-collection, large in aggregate).

On 0.4 this ran fine for months. We upgraded to 0.5 for some new features, and within a week the chromadb container was OOM-killing nightly. The pattern was unmistakable: every time a fresh collection got queried, RSS bumped a few MiB and never came back down. With ~10K collection touches a day across that fleet of 2,233, the container budget filled in about three days. Restart, repeat.

What we tried first (and what didn't work)

Restarting the container. Buys a day, doesn't fix the cause.
Upgrading ChromaDB. The underlying behavior hasn't changed in the 0.5.x line.
Increasing the container memory limit. Just delays the OOM.
Sharding collections further. We already split per (character, session) — narrower sharding would have worsened the cache, not helped it.
Blaming the embedding model. Profile pointed elsewhere.

Profiling pointed at the segment cache: ChromaDB caches per-collection segment metadata, and on 0.5 the cache is unbounded by default. The "fix" of "let's just give it more RAM" never converges if the cache only grows.

The fix

The env vars above tell ChromaDB to use an LRU eviction policy on the segment cache, capped at a memory limit you set. Once we set them and bounced the container, RSS stabilised in a 6-8 GiB band and has stayed there for months.

# docker-compose.yml
services:
  chromadb:
    image: chromadb/chroma:0.5.18
    environment:
      CHROMA_SEGMENT_CACHE_POLICY: "LRU"
      CHROMA_MEMORY_LIMIT_BYTES: "10737418240"   # 10 GiB
    deploy:
      resources:
        limits:
          memory: 12G

CHROMA_SEGMENT_CACHE_POLICY=LRU switches the cache from unbounded to least-recently-used eviction. CHROMA_MEMORY_LIMIT_BYTES is the budget LRU operates against — 10 GiB out of 32 GB host RAM, leaving room for Postgres, Redis, FastAPI, four Celery workers, nginx, ChromaDB itself, and the OS.

Pick a CHROMA_MEMORY_LIMIT_BYTES that's well under your container's hard limit — the policy needs headroom to actually evict before the kernel kills you.

The catch (don't forget this one)

These env vars are only applied at container creation. docker compose restart chromadb is not enough — you need:

docker compose up -d --force-recreate --no-deps chromadb

We learned this the second time we changed limits while debugging, watching RSS climb again wondering why the fix had stopped working. It hadn't — the new env never got picked up. If you change the limits, always recreate, not restart.

Why this isn't on the docs landing page

Most ChromaDB benchmarks and getting-started guides assume one big collection — the documented happy path. If you're per-user or per-session partitioning (multi-tenant SaaS, per-conversation memory, per-document RAG silos), you hit cache-and-eviction behaviour the docs don't warn about. The issues are real and open in the repo; the docs just haven't caught up.

This isn't a knock on the team — 0.5 was a big jump and they're shipping fast. It's just a heads-up that if your workload is "many small collections," your config has to be different from the tutorial.

Lessons

"It's a leak" is usually "it's a cache without an eviction policy." Read your dependency's cache config before chasing valgrind ghosts.
Many-small-collections is not the documented happy path. Per-user/per-session partitioning needs a config nobody's tutorial mentions.
Check open issues before assuming your config is wrong. #3336 and #5843 are community-known, not docs-known.
Set both env vars together. Without CHROMA_MEMORY_LIMIT_BYTES, the LRU policy has nothing to evict against and effectively no-ops.
Recreate, don't restart, when changing startup env. Standard Docker gotcha, doubly painful when you're debugging memory.

If you're on Chroma 0.5+ with many collections and seeing slow RSS creep — that's almost certainly it. Three lines of YAML, one container recreate, done.

This write-up is from production work at HoneyChat — a Telegram-native AI companion bot where each (character, session) pair gets its own ChromaDB collection for isolated semantic memory. The canonical version (with our other engineering notes) lives at honeychat.bot/en/blog/chromadb-lru-memory-leak-production.

— HoneyChat Engineering

Sources

ChromaDB docs — segment cache, deployment, configuration reference.
ChromaDB issue #3336 — memory leak in segment cache, open.
ChromaDB issue #5843 — many-collections behaviour, open.
Docker Compose: env vs --force-recreate — why restart doesn't pick up new env.
HoneyChat engineering notes: persistent-memory architecture · prompt caching measured.