My AI Chatbot Kept Forgetting Users — How I Used Pytest to Fix 3 Deadly LangChain Memory Bugs

#python #programming

At 1:23 am, my phone blew up. A customer was furious: “Your AI support bot forgot everything the user said in the last round. They’re calling us scammers.” I checked the monitoring dashboard and found that the Redis keys holding the conversation memory had simply vanished. LangChain’s ConversationBufferMemory didn’t raise a single exception — it just silently returned an empty context. After that night, I learned one thing the hard way: if you use LangChain for memory storage and don’t write automated tests before deploying, you are planting landmines in production.

Breaking down the problem: why the AI agent suddenly “forgot”

Our support bot is built on LangChain’s ConversationChain. Initially the memory layer used ConversationBufferMemory, but when we needed persistence, we switched to RedisChatMessageHistory. The flow is straightforward: user sends a message → load history from Redis → assemble the prompt → call the LLM → store the new turn back in Redis.

Sounds simple, but we hit three fatal issues in production:

Silent loss – When Redis timed out or a failover kicked in, RedisChatMessageHistory failed to initialize. But our code didn’t validate it and quietly replaced it with an empty ChatMessageHistory, wiping out the entire conversation context.
Dirty reads of memory – Concurrent requests for the same session landed on different pods, causing memories to overwrite each other. Users started getting replies that looked like the bot had drunk way too much.
Serialization poison – Internally, LangChain uses json.dumps to serialize messages. Certain characters (like emojis) caused it to crash, and ConversationChain swallowed the exception. The bot turned into a broken record.

Manual testing could never catch these — because you can’t simulate 3am traffic, a wobbly Redis cluster, or that one user who types a flood of emojis. The only way out was to make memory storage a regular part of the automated test suite, and use Pytest to exercise every failure path.

Why I chose Pytest + layered mocking

To test the memory layer, you need to cover at least three dimensions:

Functional correctness – memory saves, loads, and clears correctly.
Fault tolerance – when the storage backend goes down, it doesn’t silently drop memory or corrupt other sessions.
Concurrency safety – reads and writes for the same session don’t step on each other.

There are plenty of testing options, but I didn’t pick unittest:

unittest – comes with mocking, but it’s verbose, parametrization is weak, and fixtures aren’t flexible enough.
pytest – its fixture system makes context management a breeze. Combine it with pytest-asyncio for async memory, pytest-xdist to simulate concurrency, and pytest.mark.parametrize to sweep through all memory backends (in-memory, Redis, DynamoDB) in one go.
One-off scripts – no assertions, no reports, basically the same as not testing at all.

Architecturally, we extracted the memory creation into a factory function get_memory(session_id). During tests, we inject a fake Redis client via monkeypatch, so we can test real logic while simulating every kind of failure. We also added lightweight instrumentation on key memory operations to count calls and exceptions, which makes it easy to assert that “the thing that shouldn’t happen truly didn’t happen.”

Core implementation: hammering memory storage with Pytest

The snippet below tackles the question: “How do you verify both the happy path and the degraded behavior when Redis explodes at the same time?” We define two fixtures — one for a healthy memory and one that deliberately breaks — then test saving, loading, and exception handling.


python
# test_memory.py
import pytest
from unittest.mock import MagicMock, patch
from langchain.memory import ConversationBufferMemory
from langchain.schema import HumanMessage, AIMessage
from langchain_community.chat_message_histories import RedisChatMessageHistory

# Real factory used in production
def create_memory(session_id: str, redis_url: str = "redis://localhost:6379"):
    history = RedisChatMessageHistory(session_id=session_id, url=redis_url)
    return ConversationBufferMemory(
        chat_memory=history,
        return_messages=True,
        memory_key="chat_history"
    )

@pytest.fixture
def memory_factory():
    """Normal Redis memory factory"""
    return lambda sid: create_memory(sid, redis_url="redis://test-redis:6379")

@pytest.fixture
def failing_redis():
    """Simulate a completely dead Redis"""
    mock_redis = MagicMock()
    # Make all read/write operations raise connection errors
    mock_redis.lrange.side_effect = ConnectionError("Redis is dead")
    mock_redis.rpush.side_effect = ConnectionError("Redis is dead")
    return mock_redis

def test_memory_save_and_load(memory_factory):
    """Happy path: save a conversation turn, reload it, order and content must match"""
    memory = memory_factory("user-42")
    # Simulate a user-AI exchange
    memory.chat_memory.add_user_message("我叫小明，订单号123")
    memory.chat_memory.add_ai_message("收到，小明，你的订单123正在出库")

    # Re-create the memory instance for the same session (simulating next request)
    memory2 = memory_factory("user-42")
    messages = memory2.chat_memory.messages

    assert len(messages) == 2
    assert messages[0].content == "我叫小明，订单号123"
    assert isinstance(message