Pytest + Redis: My 3‑Hour Battle with a Data Inconsistency Bug

#python #programming

At 2 AM, my phone exploded – the in‑production user memory storage service was returning dirty data, and the customer support group chat was in chaos. I groggily pulled up the monitoring: Redis clearly had the key, but PostgreSQL held a completely different version. I tracked it down to a classic “cache update strategy” race condition. I rolled a fix and pushed it live just before dawn. Staring at the ceiling afterward, I kept thinking: if only an automated test had caught this bug before deployment, would I have had to crawl out of bed at all?

That’s when I decided to build an automated consistency‑verification framework for the memory store module using Pytest + Redis. Here’s the whole hands‑on journey, complete with the pitfalls the official docs don’t mention.

Problem Analysis: Why Memory Stores Are Prone to “Misremembering”

A memory store in many projects is essentially a wrapper around multi‑level caching and persistence: hot data lives in Redis, the full dataset lives in a database, and we juggle cache invalidation, cache‑as‑side population, and dual‑write logic. Typical use cases: user preferences, session state, feature‑vector caches. The pain points:

Dual‑write ordering: Write DB then delete cache, or delete cache then write DB? Both have windows where a reader can pick up stale values.
Expiry‑driven refresh: When a cache entry expires, a flood of concurrent requests may race to rebuild; we need to guarantee a single consistent rebuild.
Serialization asymmetry: The same data might be stored as JSON in Redis and as a binary blob in the DB. If serialize/deserialize behaviour isn’t perfectly symmetric, you get those creepy “looks the same but isn’t” bugs.

Manual testing can’t cover these concurrency windows and edge behaviours. The usual approach is to mock Redis in unit tests, but mocks fail to expose real differences in serialization, expiration policies, and Lua script execution. That’s why we need an automated testing framework that runs against a real Redis instance.

Design Rationale: Why Pytest + Real Redis

The tech choices weren’t hard:

Pytest over unittest: fixture‑driven dependency injection, parametrization, and conftest plugins dramatically lower mental overhead when writing test cases.
Real Redis over fakeredis/mock: Memory stores lean heavily on TTLs, pipelines, Lua atomic operations, and even modules like RedisJSON/RedisSearch. fakeredis doesn’t emulate them completely – and discovering that only when your green test suite leads to a production blow‑up is too expensive.
No need for a complicated docker‑compose setup: I already have Redis on my dev machine; in CI, a one‑line redis:alpine image suffices.

The architectural idea is simple: each test case receives, via fixture, an isolated Redis connection and a real in‑memory DB (SQLite). We then verify the memory store’s public interface (set, get, delete, refresh) through black‑box and grey‑box checks, focusing on three dimensions:

Single‑operation consistency: what you write is exactly what you read back.
Concurrent‑window consistency: after simulated concurrent updates, cache and DB eventually agree.
Error‑path consistency: behaviour on cache misses, DB outages, and serialization anomalies.

Core Implementation: Three Test Layers, Step by Step

1. Fixture Design for Isolation

The core problem this code solves: how to ensure each test case’s Redis and DB don’t interfere with each other, while still being easy to clean up.

# conftest.py
import pytest
import redis
import sqlite3
from memory_store import MemoryStore

@pytest.fixture
def redis_client():
    """使用 db=15 隔离测试数据，并确保每个用例启动时是干净的"""
    r = redis.Redis(host='localhost', port=6379, db=15, decode_responses=True)
    r.flushdb()  # 测试前清空，避免上次失败残留
    yield r
    r.flushdb()  # 测试后清理，不给下个用例留垃圾
    r.close()

@pytest.fixture
def db_conn():
    """SQLite 内存库，天然隔离且飞快"""
    conn = sqlite3.connect(":memory:")
    conn.execute("CREATE TABLE IF NOT EXISTS memory (key TEXT PRIMARY KEY, value TEXT)")
    yield conn
    conn.close()

@pytest.fixture
def store(redis_client, db_conn):
    """记忆存储实例，依赖注入"""
    return MemoryStore(redis=redis_client, db=db_conn)

I deliberately avoided scope="session", because we want every test to start from exactly the same state. If you want to speed things up by reusing connections, you can use scope="module" but must be extremely careful about key‑naming conflicts – a pitfall I’ll discuss later.

2. Basic Read/Write Consistency Validation

This code verifies the most fundamental contract: what you store is what you get back – data type and structure must not be silently altered.

import pytest
import json

@pytest.mark.parametrize("payload", [
    {"name": "Alice", "age": 30},
    {"nested": {"deep": [1,2,3]}},
    "simple_string",
    42,
    None
])
def test_set_get_consistency(store, payload):
    store.set("user:1", payload)
    result = store.get("user:1")
    # 确保值和类型完全一致，这里用 json.dumps 做标准化比较，避免 dict 顺序陷阱
    assert json.dumps(result, sort_keys=True) == json.dumps(payload, sort_keys=True)

def test_delete_then_get_returns_none(store):
    store.set("temp", "data")
    store.delete("temp")
    assert store.get("temp") is None

Note that I used json.dumps for a deep comparison rather than a direct ==. Dictionaries fetched from Redis can have different key ordering, and direct == would fail – but that’s not a business logic error, merely a serialization‑order difference. This foreshadows the first real pitfall.

3. Consistency Validation Under Concurrent Writes

This code simulates a real‑world scenario: two requests update the same key almost simultaneously; in the end, DB and cache must converge on the same value, not hold conflicting versions.