Chasing Chroma’s Silent Data Loss: How Playwright E2E Tests Saved 72 Hours

#python #programming

At 2 AM, a message exploded in our user group: “Why does the AI assistant ask my name again? It remembered I’m Xiao Li yesterday.” I checked monitoring — memory normal, CPU stable, no errors in logs. After restarting the service, the memory came back — only to disappear again the next day. No error, no exception stack trace. Chroma had silently dropped the documents. That was when I realized: conventional unit tests are useless when it comes to vector databases.

Breaking down the problem

This LLM app uses a typical RAG memory pipeline: user conversations are summarized and written into the vector store; later queries retrieve relevant memories to inject into prompts. Memory persistence relies on Chroma’s persist(). In integration tests we mocked its response, wrote a few documents, queried them — all green. But production is a different story.

The root cause is a combination of three factors. First, Chroma’s default persist() does not flush to disk immediately; the underlying SQLite has its own caching strategy, so if the process exits unexpectedly, the last few seconds of writes can vanish. Second, we used async add_texts but returned HTTP 200 before the underlying flush completed. Third, in our containerized deployment /tmp/chroma-data sat on Kubernetes ephemeral storage, so it got reclaimed — the log simply said “Persist succeeded” while the directory was already gone. Regular unit tests can never simulate these edge cases because everything runs inside a single process, and mocking silently covers up the most dangerous parts.

Designing the solution

To catch this kind of “silent data loss”, tests must run in a real multi‑process environment with network latency and disk I/O, covering the full persistence lifecycle: write → service restart → read. This is exactly where end‑to‑end testing shines.

For tooling, we ruled out pure API tests — fast but they bypass the browser and real request chain, so you can’t verify that the whole call path from a frontend trigger truly persists data to disk. We also ruled out Selenium — too heavy and its async support is poor. We settled on Playwright + pytest‑asyncio because it can:

Simulate multi‑turn conversations in a real browser
Use page.wait_for_* to control timing and capture the window where HTTP 200 is returned before data gets asynchronously lost
Directly control service process restarts during tests, verifying whether data actually landed on disk

The architecture follows “test‑as‑a‑user”: every case patterns like “open the page → send a message → close the browser → restart the service → open the page again → check if the historical memory still exists”. Only this brute‑force approach exposes persistence issues in vector databases.

Core implementation

Let’s start with a simplified app under test — a FastAPI service with a chat frontend and Chroma for memory, started with uvicorn. The code below solves one problem: how to start/stop the service with Playwright and wrap it into a reusable fixture.

# conftest.py
import subprocess
import time
import httpx
import pytest_asyncio
from playwright.async_api import async_playwright

@pytest_asyncio.fixture(scope="function")
async def app_service(tmp_path):
    # 每个测试用例拥有独立的 Chroma 持久化目录，避免数据串扰
    chroma_dir = tmp_path / "chroma_data"
    chroma_dir.mkdir()

    # 启动被测试服务，设置环境变量指定持久化路径
    proc = subprocess.Popen(
        ["uvicorn", "main:app", "--host", "127.0.0.1", "--port", "8765"],
        env={"CHROMA_PERSIST_DIR": str(chroma_dir), **__import__("os").environ},
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE
    )
    # 等待服务就绪
    async with httpx.AsyncClient() as client:
        for _ in range(30):
            try:
                resp = await client.get("http://127.0.0.1:8765/health")
                if resp.status_code == 200:
                    break
            except Exception:
                pass
            await asyncio.sleep(0.5)
        else:
            proc.kill()
            raise RuntimeError("Service did not start")

    yield {"proc": proc, "base_url": "http://127.0.0.1:8765", "chroma_dir": chroma_dir}

    # 测试结束后强杀进程，模拟生产环境可能出现的非正常退出
    proc.kill()
    proc.wait()

This fixture manages an isolated data directory per test and the process lifecycle, ensuring a clean persistence environment while simulating unexpected termination. Next is the core test case that verifies whether memory survives a restart.


python
# test_memory_persistence.py
import asyncio
import pytest
from playwright.async_api import async_playwright

pytestmark = pytest.mark.asyncio(loop_scope="module")

async def test_memory_survives_restart(app_service):
    """
    场景：用户说自己的名字，重启服务后，AI 仍然记得。
    这个测试直接揪出了 Chroma 的异步刷盘问题。
    """
    base = app_service["base_url"]

    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()

        # Step 1: 用户首次对话，留下记忆
        await page.goto(base)
        await page.fill("#user-

DEV Community

Chasing Chroma’s Silent Data Loss: How Playwright E2E Tests Saved 72 Hours

Breaking down the problem

Designing the solution

Core implementation

Top comments (0)