At 2 AM, a message exploded in our user group: “Why does the AI assistant ask my name again? It remembered I’m Xiao Li yesterday.” I checked monitoring — memory normal, CPU stable, no errors in logs. After restarting the service, the memory came back — only to disappear again the next day. No error, no exception stack trace. Chroma had silently dropped the documents. That was when I realized: conventional unit tests are useless when it comes to vector databases.
Breaking down the problem
This LLM app uses a typical RAG memory pipeline: user conversations are summarized and written into the vector store; later queries retrieve relevant memories to inject into prompts. Memory persistence relies on Chroma’s persist(). In integration tests we mocked its response, wrote a few documents, queried them — all green. But production is a different story.
The root cause is a combination of three factors. First, Chroma’s default persist() does not flush to disk immediately; the underlying SQLite has its own caching strategy, so if the process exits unexpectedly, the last few seconds of writes can vanish. Second, we used async add_texts but returned HTTP 200 before the underlying flush completed. Third, in our containerized deployment /tmp/chroma-data sat on Kubernetes ephemeral storage, so it got reclaimed — the log simply said “Persist succeeded” while the directory was already gone. Regular unit tests can never simulate these edge cases because everything runs inside a single process, and mocking silently covers up the most dangerous parts.
Designing the solution
To catch this kind of “silent data loss”, tests must run in a real multi‑process environment with network latency and disk I/O, covering the full persistence lifecycle: write → service restart → read. This is exactly where end‑to‑end testing shines.
For tooling, we ruled out pure API tests — fast but they bypass the browser and real request chain, so you can’t verify that the whole call path from a frontend trigger truly persists data to disk. We also ruled out Selenium — too heavy and its async support is poor. We settled on Playwright + pytest‑asyncio because it can:
- Simulate multi‑turn conversations in a real browser
- Use
page.wait_for_*to control timing and capture the window where HTTP 200 is returned before data gets asynchronously lost - Directly control service process restarts during tests, verifying whether data actually landed on disk
The architecture follows “test‑as‑a‑user”: every case patterns like “open the page → send a message → close the browser → restart the service → open the page again → check if the historical memory still exists”. Only this brute‑force approach exposes persistence issues in vector databases.
Core implementation
Let’s start with a simplified app under test — a FastAPI service with a chat frontend and Chroma for memory, started with uvicorn. The code below solves one problem: how to start/stop the service with Playwright and wrap it into a reusable fixture.
# conftest.py
import subprocess
import time
import httpx
import pytest_asyncio
from playwright.async_api import async_playwright
@pytest_asyncio.fixture(scope="function")
async def app_service(tmp_path):
# 每个测试用例拥有独立的 Chroma 持久化目录,避免数据串扰
chroma_dir = tmp_path / "chroma_data"
chroma_dir.mkdir()
# 启动被测试服务,设置环境变量指定持久化路径
proc = subprocess.Popen(
["uvicorn", "main:app", "--host", "127.0.0.1", "--port", "8765"],
env={"CHROMA_PERSIST_DIR": str(chroma_dir), **__import__("os").environ},
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
# 等待服务就绪
async with httpx.AsyncClient() as client:
for _ in range(30):
try:
resp = await client.get("http://127.0.0.1:8765/health")
if resp.status_code == 200:
break
except Exception:
pass
await asyncio.sleep(0.5)
else:
proc.kill()
raise RuntimeError("Service did not start")
yield {"proc": proc, "base_url": "http://127.0.0.1:8765", "chroma_dir": chroma_dir}
# 测试结束后强杀进程,模拟生产环境可能出现的非正常退出
proc.kill()
proc.wait()
This fixture manages an isolated data directory per test and the process lifecycle, ensuring a clean persistence environment while simulating unexpected termination. Next is the core test case that verifies whether memory survives a restart.
python
# test_memory_persistence.py
import asyncio
import pytest
from playwright.async_api import async_playwright
pytestmark = pytest.mark.asyncio(loop_scope="module")
async def test_memory_survives_restart(app_service):
"""
场景:用户说自己的名字,重启服务后,AI 仍然记得。
这个测试直接揪出了 Chroma 的异步刷盘问题。
"""
base = app_service["base_url"]
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
# Step 1: 用户首次对话,留下记忆
await page.goto(base)
await page.fill("#user-
Top comments (0)