From 800 Lines of Shell to 30 Lines of Pytest: 10x Redis Persistence Testing Efficiency

#python #programming

It was 2 a.m. when I got jolted awake by an alerting call—all user points data had rolled back by three hours. After digging for ages, I found that ops had tweaked the save parameter in redis.conf, changing the RDB snapshot interval from 5 minutes to 3 hours. When the node restarted, a massive amount of hot data simply evaporated. What made it worse: this configuration change had been “tested manually”. A colleague restarted Redis, saw that the keys were still there, and called it good. I cursed at the screen: “What’s the point of testing if you test like this?”

The next day, I tore down the entire persistence verification setup and rebuilt it with pytest + Docker as an automated test suite. What used to take 800 lines of Shell and 2 hours of environment tweaking now runs in a few minutes with 30 lines of pytest. Best of all, any reckless change to the persistence configuration can be proven within 10 seconds—did we lose data or not?

Breaking down the problem: why manual Shell/Docker persistence tests are basically useless

Redis persistence comes in three flavors: RDB, AOF, and a mix of both, plus a jungle of parameters like save, appendfsync, aof-use-rdb-preamble, and many more—combinatorial explosion. Most teams verify persistence in one of two ways:

Manually starting and stopping Docker containers, writing a few items with redis-cli, doing docker restart, then running KEYS *—which only proves “it can start”, not “how many seconds of data disappeared”.
Writing a pile of Shell scripts that use docker exec to drive redis-cli and then diff the data—scripts that get bloated and are brittle because the environment changes every time: docker stop wait time, file cleanup policies—even a minor change makes results unpredictable.

The root cause is clear: Redis persistence is the product of a time window, system signals, and filesystem flushing. Manual operation simply can’t control these precisely. For example, docker stop sends SIGTERM to the container by default; when Redis receives it, it tries to perform an RDB save. But how long does that save take? Will it be cut off by SIGKILL? A Shell script has no ability to simulate fault scenarios like “how much data is lost at the moment of a crash.” Even more importantly, consistency verification lacks repeatable assertions—manual testing only gives you a gut feeling that “probably nothing was lost.” That’s a landmine for production.

Solution design: why pytest + Docker, not Testcontainers or a K8s Job?

I wanted a programmable, assertable, reproducible test framework with these core requirements:

Precisely control Redis startup parameters and persistence configuration
Simulate real-world failures: kill -9, power-off-style shutdown, AOF file truncation, etc.
Automatically clean up the environment after a run—no leftover garbage
Run in CI/CD, but also instantly on a dev machine

Technology comparison:

Solution	Pros	Why not chosen
Shell + docker-compose	Team familiarity	Weak assertions, unable to precisely control restarts and signals, shell script maintenance nightmare
Testcontainers (Python)	Native pytest integration, good lifecycle management	Can only operate parameters through `redis-cli` after initialization? In reality, config changes (e.g., dynamic AOF switching) require another wrapper; also the underlying `docker-java` isn’t very friendly to Python, high debugging cost
Kubernetes Job	Production-grade	Too heavy, can’t run locally, CI needs a K8s cluster – using a sledgehammer to crack a nut
docker-py + pytest	Lightweight, programmable container lifecycle control, native Python assertions	This is the one I chose. Use the `docker` SDK to start/stop containers and manage volumes, `redis-py` for data read/write, pytest fixtures for environment injection. The whole solution is under 500 lines of Python, and on CI it only depends on a Docker daemon.

Architecturally, I split the tests into three layers:

Infrastructure layer: docker-py creates Redis containers, mounts temporary volumes for RDB/AOF files.
Operation layer: redis-py writes, reads, issues CONFIG SET, BGSAVE, etc.
Assertion layer: pytest asserts whether data exists, whether files were created, whether the AOF contains the last write.

This layering lets test cases focus only on “write data → how it dies → is the data correct after restart,” without caring about how the container starts or what mount paths are used.

Core implementation: ready-to-run test code

The following code addresses one problem: verify that after a Redis process is killed with kill -9, all data written after the last BGSAVE is lost as expected—and no extra loss occurs.

1. conftest.py: managing the Redis container lifecycle with a fixture


python
# conftest.py
import pytest
import docker
import redis
import time
import os

REDIS_IMAGE = "redis:7.2"  # 固定版本，避免 CI 上拉取 latest 导致不一致

@pytest.fixture(scope="function")
def rdb_container(tmp_path):
    """
    启动一个配置了 RDB 持久化的 Redis 容器，数据文件写入临时目录。
    tmp_path 是 pytest 提供的临时路径，每个测试函数独立，互不干扰。
    """
    client = docker.from_env()
    data_dir = tmp_path / "data"
    data_dir.mkdir()

    container = client.containers.run(
        image=REDIS_IMAGE,
        name=f"redis-rdb-test-{os.getpid()}",  # 避免容器重名
        command=[
            "redis-server",
            "--save 900 1",        # 900秒内至少1次修改则保存，这里故意设大，手动控制BGSAVE
            "--save 300 10",
            "--save 60 10000",
            "--dir /data",
            "--dbfilename dump.rdb"
        ],
        volumes={str(data_dir): {"bind": "/data", "mode": "rw"}},
        ports={"6379/tcp": None},  # 让 Docker 分配随机端口
        detach=True,
        remov