I got paged at 2 AM. Our AI teaching assistant had "amnesia." A student had just explained their lab progress 30 minutes earlier, but when they asked "What should I do next?", the assistant replied, "Could you tell me your current progress?" The user was furious. I rolled out of bed, checked the logs, and saw that the memory had been written successfully in Mem0 – the add API returned a 200. Yet a subsequent search came up empty. That single bug cost me 3 days of investigation.
Breaking down the problem
Our AI Agent relies on Mem0 for long-term memory: conversation history, user preferences, and task state are all stored there. During a continuous conversation, the agent first calls search() to retrieve relevant memories, stitches them into the prompt, and then generates an answer. In theory, a memory should be immediately searchable after insertion. In reality:
-
Write succeeds, but retrieval returns nothing: the API call returns a
memory_id, but searching with the same query moments later yields no hits. -
Locally visible, globally gone: a memory can be found within a single user session, but disappears when queried cross-session or under a different
user_idfor shared memory. - Intermittent failures: tests pass locally but turn red in CI – once again, memories are missing.
The root cause pointed to three suspects: asynchronous indexing delays, mismatch between search parameters and written content, and default cleanup policies. Manual ad-hoc testing can't cover these timing-sensitive scenarios – you can't reasonably send dozens of messages every time you deploy. We needed an automated regression suite that specifically stresses the write-then-retrieve loop.
Solution design
We built a memory verification system using pytest + Mem0 Python SDK. Here's how we weighed the options:
- unittest – not flexible enough; fixture management is cumbersome and parametrization support is weak.
- Mocking Mem0 API – it wouldn't expose real indexing behaviour, making the tests pointless.
- curl/bash scripts – poor maintainability and rudimentary assertions.
The final setup: we spin up a Mem0 service (backed by a Qdrant vector store) via docker-compose, encapsulate the client fixture in pytest's conftest.py with data isolation, and write test cases covering single writes, batch writes, retrieval after updates, and concurrent writes. Now every code push triggers a CI run that exercises the entire memory pipeline, catching previously invisible async issues before they hit production.
Core implementation
The first piece sets up the test infrastructure: initialises a Mem0 client and generates a unique user_id/app_id for each test, eliminating cross-test noise.
# conftest.py
import pytest
from mem0 import Memory
@pytest.fixture(scope="session")
def mem0_client():
"""连接本地Mem0服务,配置写入同步模式"""
return Memory.from_config({
"version": "v1.1",
"embedder": {
"provider": "openai",
"config": {"model": "text-embedding-3-small"}
},
"vector_store": {
"provider": "qdrant",
"config": {"host": "localhost", "port": 6333}
}
})
@pytest.fixture
def fresh_agent(mem0_client: Memory):
"""
每个测试拿全新agent_id,测试结束清理数据,
保证测试间完全隔离。
"""
agent_id = f"test_agent_{pytest.uid}"
yield mem0_client, agent_id
# 清理:删除该agent所有记忆
mem0_client.delete_all(user_id=agent_id)
Next comes the critical verification: after a write, you must be able to find it. We include retry logic because Mem0's vector indexing is asynchronous by default – a lesson written in blood.
# test_memory_basic.py
import time
from mem0 import Memory
def test_add_and_search_must_find(fresh_agent):
"""基本闭环:写入一条记忆,立刻检索必须出现"""
client, agent_id = fresh_agent
# 写入:记录用户偏好
payload = f"用户{agent_id}喜欢用黑暗模式阅读代码"
client.add(payload, user_id=agent_id)
# 坑点:索引异步,立即search可能为空,需要retry
deadline = time.time() + 5 # 5秒超时
found = False
while time.time() < deadline:
results = client.search("喜欢什么模式", user_id=agent_id)
# 至少命中一条且内容包含关键词
if any("黑暗模式" in r["memory"] for r in results):
found = True
break
time.sleep(0.5)
assert found, f"5秒内未检索到写入的记忆,search返回: {results}"
This test is the heart of the article. The mantra is: “Don't trust the API – trust the result after retries.” Below is an additional stress test that ensures no data is lost under concurrent writes:
# test_concurrent_write.py
from concurrent.futures import ThreadPoolExecutor
from mem0 import Memory
def test_concurrent_add_never_lost(fresh_agent):
"""10个线程同时写入不同偏好,最终都应能搜到"""
client, agent_id = fresh_agent
preferences = [
f"{agent_id}偏好亮色主题",
f"{agent_id}习惯用2空格缩进",
f"{agent_id}喜欢在代码里加emoji注释",
# ... 共10条
] * 10 # 批量复制到10条
with ThreadPoolExecutor(max_workers=10) as pool:
pool.map(lambda p: client.add(p, user_id=agent_id), preferences)
# 等待索引完成
time.sleep(3) # 给异步索引一些时间
results = client.search("主题和缩进", user_id=agent_id)
found_prefs = {r["memory"] for r in results}
assert all(p in found_prefs for p in preferences), "并发写入后存在丢失的记忆"
These tests are now part of our CI pipeline. Every commit triggers a Mem0 end-to-end check that has already saved us from chasing phantom memory loss at midnight. If you're building an AI agent that depends on reliable long-term memory, I strongly recommend stealing this approach – your sleep will thank you.
Top comments (0)