DEV Community

BAOFUFAN
BAOFUFAN

Posted on

Chasing Chroma’s Silent Data Loss: How Playwright E2E Tests Saved 72 Hours

At 2 AM, a message exploded in our user group: “Why does the AI assistant ask my name again? It remembered I’m Xiao Li yesterday.” I checked monitoring — memory normal, CPU stable, no errors in logs. After restarting the service, the memory came back — only to disappear again the next day. No error, no exception stack trace. Chroma had silently dropped the documents. That was when I realized: conventional unit tests are useless when it comes to vector databases.

Breaking down the problem

This LLM app uses a typical RAG memory pipeline: user conversations are summarized and written into the vector store; later queries retrieve relevant memories to inject into prompts. Memory persistence relies on Chroma’s persist(). In integration tests we mocked its response, wrote a few documents, queried them — all green. But production is a different story.

The root cause is a combination of three factors. First, Chroma’s default persist() does not flush to disk immediately; the underlying SQLite has its own caching strategy, so if the process exits unexpectedly, the last few seconds of writes can vanish. Second, we used async add_texts but returned HTTP 200 before the underlying flush completed. Third, in our containerized deployment /tmp/chroma-data sat on Kubernetes ephemeral storage, so it got reclaimed — the log simply said “Persist succeeded” while the directory was already gone. Regular unit tests can never simulate these edge cases because everything runs inside a single process, and mocking silently covers up the most dangerous parts.

Designing the solution

To catch this kind of “silent data loss”, tests must run in a real multi‑process environment with network latency and disk I/O, covering the full persistence lifecycle: write → service restart → read. This is exactly where end‑to‑end testing shines.

For tooling, we ruled out pure API tests — fast but they bypass the browser and real request chain, so you can’t verify that the whole call path from a frontend trigger truly persists data to disk. We also ruled out Selenium — too heavy and its async support is poor. We settled on Playwright + pytest‑asyncio because it can:

  • Simulate multi‑turn conversations in a real browser
  • Use page.wait_for_* to control timing and capture the window where HTTP 200 is returned before data gets asynchronously lost
  • Directly control service process restarts during tests, verifying whether data actually landed on disk

The architecture follows “test‑as‑a‑user”: every case patterns like “open the page → send a message → close the browser → restart the service → open the page again → check if the historical memory still exists”. Only this brute‑force approach exposes persistence issues in vector databases.

Core implementation

Let’s start with a simplified app under test — a FastAPI service with a chat frontend and Chroma for memory, started with uvicorn. The code below solves one problem: how to start/stop the service with Playwright and wrap it into a reusable fixture.

# conftest.py
import subprocess
import time
import httpx
import pytest_asyncio
from playwright.async_api import async_playwright

@pytest_asyncio.fixture(scope="function")
async def app_service(tmp_path):
    # 每个测试用例拥有独立的 Chroma 持久化目录,避免数据串扰
    chroma_dir = tmp_path / "chroma_data"
    chroma_dir.mkdir()

    # 启动被测试服务,设置环境变量指定持久化路径
    proc = subprocess.Popen(
        ["uvicorn", "main:app", "--host", "127.0.0.1", "--port", "8765"],
        env={"CHROMA_PERSIST_DIR": str(chroma_dir), **__import__("os").environ},
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE
    )
    # 等待服务就绪
    async with httpx.AsyncClient() as client:
        for _ in range(30):
            try:
                resp = await client.get("http://127.0.0.1:8765/health")
                if resp.status_code == 200:
                    break
            except Exception:
                pass
            await asyncio.sleep(0.5)
        else:
            proc.kill()
            raise RuntimeError("Service did not start")

    yield {"proc": proc, "base_url": "http://127.0.0.1:8765", "chroma_dir": chroma_dir}

    # 测试结束后强杀进程,模拟生产环境可能出现的非正常退出
    proc.kill()
    proc.wait()
Enter fullscreen mode Exit fullscreen mode

This fixture manages an isolated data directory per test and the process lifecycle, ensuring a clean persistence environment while simulating unexpected termination. Next is the core test case that verifies whether memory survives a restart.


python
# test_memory_persistence.py
import asyncio
import pytest
from playwright.async_api import async_playwright

pytestmark = pytest.mark.asyncio(loop_scope="module")

async def test_memory_survives_restart(app_service):
    """
    场景:用户说自己的名字,重启服务后,AI 仍然记得。
    这个测试直接揪出了 Chroma 的异步刷盘问题。
    """
    base = app_service["base_url"]

    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()

        # Step 1: 用户首次对话,留下记忆
        await page.goto(base)
        await page.fill("#user-
Enter fullscreen mode Exit fullscreen mode

Top comments (0)