BAOFUFAN

Posted on Jun 6

From Mock to Real Redis: Cutting Agent Memory Test Leakage from 30% to 0

#python #programming

Woken up by PagerDuty at 2 AM. The user group was on fire — our AI customer service agent suddenly lost its memory. One message confirmed the user's phone number, the next asked "How may I address you?" Checking the logs revealed that the Redis connection pool threw a ConnectionError during a network hiccup. Our supposedly bulletproof mock tests had never simulated that exception. The code simply skipped memory persistence, and all context was lost. Even scarier: the regression suite was green.

This is a textbook disaster of having mocks without real-middleware tests. We spent two weeks rebuilding the automated verification of the agent’s memory module with pytest + a real Redis instance. The result: online memory-related bugs went from a 30% leakage rate to zero. Here’s the full blueprint, code, and the sharp edges we found along the way.

1. Why mocks can't catch the real risks of an agent memory module

An agent’s memory isn’t simple key-value storage. It handles three things:

Short-term memory: the last N turns of dialogue, stored in a Redis List or ZSet with TTL-based eviction.
Long-term summaries: compressing long conversations into summaries via an LLM, saved in a Redis String or Hash with longer expiration.
Context assembly: fetching memories from Redis and stitching them into the prompt. A failure at any step makes the agent spout nonsense.

A typical mock test patches redis.Redis with unittest.mock.patch or builds a fake client. It’s easy to reach 80% coverage this way. But the real world doesn’t offer “always-succeeding storage”:

A network blip can exhaust the connection pool, raising redis.exceptions.ConnectionError. The mock version just returns True.
Serialization/deserialization isn’t always flawless — pickle protocol mismatches or complex nested objects with unpicklable lambdas never surface with mocks.
Redis eviction policies (like allkeys-lru) silently drop keys when memory gets tight. Your agent’s notepad vanishes. A mock only returns None when you explicitly set it.

The root cause: a mock simulates the Redis you imagine, not real Redis. Unit tests verify logic branches but can’t expose process boundaries, network boundaries, or data consistency issues. For a module like agent memory that depends heavily on external state, integration tests must run against real Redis. Otherwise you’ll eventually pay the technical debt in production.

2. Design choices — why we skipped the alternatives

We considered three paths for testing with real Redis:

Manually maintained dev Redis instance: Clean it manually before tests, check results manually after. Cons: easy to forget, environment drift, and nobody starts Redis for you in CI.
Embedded Redis (e.g., embedded-redis): Common in the JVM world; the Python alternatives are scarce, outdated, and incompatible with Redis 7 features we rely on.
Dockerized Redis + pytest for automatic life-cycle management: This is the right path. Spin up a Redis container before tests, tear it down after. Clean, reproducible, and one CI command covers it.

We chose pytest + redis-py + testcontainers-python (with a fallback to docker-compose). The reason: testcontainers lets you declare a Redis container right in conftest, automatically waits for the port to be ready, and destroys it when tests end — no extra scripting. Combined with a scope="session" fixture to share the container across tests, each test function uses an isolated Redis namespace (a prefix or a dedicated db number) to avoid cross-contamination while staying realistic.

Why not just run tests against a real production Redis on a dedicated db number? If you accidentally misconfigure something and flushdb isn’t blocked, you’ll have another horror story to tell.

3. Core implementation — step-by-step migration from mock to real Redis

Here are three key code blocks: the container-management fixture, the memory storage implementation, and the test cases. You should be able to drop them into your project and run.

3.1 The pytest fixture that manages a Redis container automatically

This solves the “how to get a clean Redis instance for tests” problem. We use testcontainers to launch Redis 7 and add a manual wait_for to be absolutely sure the container is ready before handing it over.

# conftest.py
import pytest
import redis
from testcontainers.redis import RedisContainer

@pytest.fixture(scope="session")
def redis_container():
    """Session-scoped Redis container – started once per test session"""
    container = RedisContainer("redis:7-alpine")
    container.with_exposed_ports(6379)
    container.start()
    # Ensure Redis is truly ready (the built-in wait strategy sometimes isn't enough)
    client = redis.Redis(
        host=container.get_container_host_ip(),
        port=container.get_exposed_port(6379),
    )
    client.ping()  # Will fail fast here if not ready
    yield container
    container.stop()

@pytest.fixture
def redis_client(redis_container):
    """Per-test Redis client with an isolated DB to avoid interference"""
    client = redis.Redis(
        host=redis_container.get_container_host_ip(),
        port=redis_container.get_exposed_port(6379),
        db=0,
        decode_responses=True,  # avoid manual .decode()
    )
    client.flushdb()  # Start clean
    yield client
    client.close()

Why decode_responses=True? The agent’s memory module mostly stores natural language text. Returning bytes forces a .decode() everywhere, adding noise that buries the actual assertions.

3.2 The agent memory storage module under test

This is a simplified version of production code, living in memory_store.py. It stores session context in a Redis Hash with a TTL. We intentionally kept a serialization path that is easy to break in production (using json instead of pickle, but validating data integrity).

# memory_store.py
import json
import redis
from typing import Dict, Optional
import logging

logger = logging.getLogger(__name__)

class AgentMemoryStore:
    def __init__(self, redis_client: redis.Redis, ttl: int = 3600):
        self.redis = redis_client
        self.ttl = ttl

    def save_context(self, session_id: str, context: Dict) -> bool:
        try:
            key = f"agent:session:{session_id}"
            serialized = json.dumps(context, default=str)
            self.redis.setex(key, self.ttl, serialized)
            return True
        except (redis.exceptions.ConnectionError, TypeError) as e:
            logger.error(f"Failed to save context for {session_id}: {e}")
            return False

    def load_context(self, session_id: str) -> Optional[Dict]:
        try:
            key = f"agent:session:{session_id}"
            data = self.redis.get(key)
            if data is None:
                return None
            return json.loads(data)
        except (redis.exceptions.ConnectionError, json.JSONDecodeError) as e:
            logger.error(f"Failed to load context for {session_id}: {e}")
            return None

3.3 Test cases against real Redis — including failure scenarios

This tests the happy path and the scenarios that mock tests never touch: connection failures, serialization errors, and Redis eviction behavior.

# test_memory_store_real.py
import json
import pytest
from unittest.mock import patch
import redis
from memory_store import AgentMemoryStore

def test_save_and_load_context_success(redis_client):
    store = AgentMemoryStore(redis_client)
    context = {"phone": "13800138000", "intent": "return_order"}
    assert store.save_context("session_1", context)
    loaded = store.load_context("session_1")
    assert loaded == context

def test_context_not_found(redis_client):
    store = AgentMemoryStore(redis_client)
    assert store.load_context("nonexistent_session") is None

def test_connection_error_graceful_failure(redis_client, mocker):
    """Simulate a ConnectionError — the store must handle it gracefully."""
    store = AgentMemoryStore(redis_client)
    # Force connection failure on setex
    mocker.patch.object(redis_client, 'setex', side_effect=redis.exceptions.ConnectionError("boom"))
    result = store.save_context("session_err", {"a": 1})
    assert result is False

def test_serialization_error_handling(redis_client):
    """What happens when we try to save an unserializable object?"""
    store = AgentMemoryStore(redis_client)
    bad_context = {"fn": lambda x: x}  # lambda not serializable by json
    result = store.save_context("session_bad", bad_context)
    # Should gracefully fail, not throw
    assert result is False

def test_eviction_behavior(redis_client):
    """Set a tiny TTL and wait — then check that data disappears."""
    store = AgentMemoryStore(redis_client, ttl=2)
    store.save_context("short_lived", {"value": "ephemeral"})
    import time
    time.sleep(3)
    assert store.load_context("short_lived") is None

These tests run against the exact Redis version and memory constraints that match production. No more “it works on my fake client.”

4. CI integration — one-line command + environment isolation

Running real Redis in CI often raises concerns about Docker/socket access. We added the CI configuration to ensure smooth execution:

# .github/workflows/agent_memory_tests.yml (excerpt)
jobs:
  test:
    runs-on: ubuntu-latest
    services:
      # Optional fallback if testcontainers can't access Docker socket
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install -r requirements-test.txt
      - name: Run memory integration tests
        run: pytest tests/ --real-redis

We added the --real-redis custom marker so that unit tests relying on mocks and integration tests requiring the container can coexist. The marker also skips these tests automatically when no Redis is available (e.g., a local dev environment without Docker).

# conftest.py (continued)
def pytest_addoption(parser):
    parser.addoption("--real-redis", action="store_true", default=False,
                     help="run tests against real Redis")

def pytest_configure(config):
    config.addinivalue_line("markers", "real_redis: mark test as requiring real Redis")

def pytest_collection_modifyitems(config, items):
    if not config.getoption("--real-redis"):
        skip_real = pytest.mark.skip(reason="need --real-redis option to run")
        for item in items:
            if "real_redis" in item.keywords:
                item.add_marker(skip_real)

This gives us two layers of safety: CI always runs the real-Redis suite, while local developers can quickly iterate without Docker when they choose.

5. The payoff — measurable metrics

After rolling out real Redis tests, we tracked the agent memory module’s bug escape rate for three months.

Before: 30% of memory-related production incidents originated from behavior that passed the mock test suite. Typical causes: connection pool exhaustion, key eviction, serialization errors.
After: zero production incidents caused by the same class of problems. The team also gained confidence to refactor memory storage, including a migration from JSON to MessagePack, because the tests caught the edge cases immediately.

The stability improvement went beyond metrics: the on-call team stopped being woken up at 2 AM for memory loss bugs.

6. Lessons & watch out

Throughout the migration, we hit several pitfalls worth sharing:

Don’t share the same Redis DB across test functions — even with flushdb(), concurrent test runs (pytest-xdist) will collide. Use unique key prefixes or separate DBs.
Testcontainers startup time can be sneaky in large suites. We pinned the Redis image version and used scope="session" to keep total test time under 10 seconds.
Connection pool exhaustion is real. In one test we simulated 1000 rapid-fire saves, which exhausted the default connection pool. Setting max_connections higher and adding close logic in teardown fixed it — and caught a production leak we didn’t know we had.
Keep your test Redis version aligned with production. We originally used redis:latest and missed behavior changes when production was still on Redis 6. Now we explicitly pin redis:7-alpine.

Swapping mocks for real Redis isn’t just about testing — it’s about trust. For stateful agents, the distance between “tests pass” and “it works in production” is measured by how closely your test environment mirrors reality. In our case, that mirror was a Docker container, and it saved our sleep.

DEV Community