Mohamed Tahri

Posted on Oct 24 • Edited on Oct 25

Snapshot Testing in Python with pytest-verify

#python #pytest #snapshot #testing

💡 Why Snapshot Testing Matters

When you work with API, machine learning models, data pipelines, or configuration files, your Python tests often deal with large structured outputs — JSON, YAML, XML, DataFrames, etc.

Keeping track of every single field with traditional assertions quickly becomes a nightmare:

    assert data["status"] == "ok"
    assert data["count"] == 200
    assert data["users"][0]["id"] == 123
    assert data["users"][0]["active"] is True

Instead of chasing fragile asserts, what if you could just snapshot your API’s response and automatically detect meaningful changes?

That’s what pytest-verify does.

🔍 Introducing pytest-verify

pytest-verify is a lightweight extension to pytest that
automatically saves and compares your test outputs.

Instead of asserting field-by-field, you just return the object,
and @verify_snapshot does the rest.

It detects the data type (JSON, YAML, XML, etc.), serializes it,
creates a .expected snapshot, and compares future test runs to it.

If something changes — you get a clear unified diff.

⚙️ Installation

pip install pytest-verify

🧠 How It Works

The decorator @verify_snapshot automatically:

Detects the data format based on your test’s return type.
Serializes it to a stable format (JSON, YAML, XML, etc.).
Saves a baseline .expected file on first run.
Compares future runs against that baseline.
Displays a unified diff when something changes.

On first run, it creates a snapshot file such as::

__snapshots__/test_weather_api_snapshot.expected.json

On subsequent runs, it compares and prints a diff if the result
has changed beyond your tolerances or ignored fields.

🌦 Example 1 — Snapshot Testing an API Response

Let’s say you’re testing a REST API endpoint:

import requests

def fetch_user_data():
    response = requests.get("https://api.example.com/users/42")
    return response.json()

When you print it out, you get something like this 👇:

{
  "user": {"id": 42, "name": "Ayoub", "role": "admin"},
  "meta": {"timestamp": "2025-10-24T12:00:00Z", "api_version": "v3.4"},
  "metrics": {"latency": 152.4, "success_rate": 99.9}
}

Perfect. Now let’s write a snapshot test for it.

1.Basic API Snapshot

from pytest_verify import verify_snapshot

@verify_snapshot()
def test_user_api_snapshot():
    from myapp.api import fetch_user_data
    return fetch_user_data()

👉 On the first run, this creates:

__snapshots__/test_user_api_snapshot.expected.json

with the formatted API response saved inside.
On future runs, it compares automatically — no asserts required.

2.Ignoring Dynamic Fields

A day later, the API changes the timestamp and ID.
Same structure, different values:

{
  "user": {"id": 1051, "name": "Ayoub", "role": "admin"},
  "meta": {"timestamp": "2025-10-25T10:05:00Z", "api_version": "v3.4"},
  "metrics": {"latency": 153.0, "success_rate": 99.9}
}

Your test breaks — but should it?

Let’s tell pytest-verify to ignore fields that are expected to change:

@verify_snapshot(ignore_fields=["$.user.id", "$.meta.timestamp"])
def test_user_api_snapshot_ignore_fields():
    from myapp.api import fetch_user_data
    return fetch_user_data()

✅ Now your snapshot ignores the dynamic fields while still catching real structure or data changes.

3.Handle Numeric Drift with Global Tolerances

Let’s say the backend metrics fluctuate a bit between runs.

New response:

{
  "user": {"id": 42, "name": "Ayoub", "role": "admin"},
  "meta": {"timestamp": "2025-10-24T12:10:00Z", "api_version": "v3.4"},
  "metrics": {"latency": 152.9, "success_rate": 99.89}
}

Tiny differences like these shouldn’t fail your test.
This is where global tolerances come in:

@verify_snapshot(
    ignore_fields=["$.meta.timestamp"],
    abs_tol=1.0,
    rel_tol=0.01
)
def test_user_api_snapshot_with_global_tolerance():
    from myapp.api import fetch_user_data
    return fetch_user_data()

✅ This allows:

Any numeric field to vary by ±1.0 (abs_tol)
Or by up to 1% difference (rel_tol)

You don’t need to list every field — the tolerance applies globally to all numeric values.

4.Field-Specific Tolerances

Now imagine you want finer control — maybe latency can fluctuate more than success rate.

You can define per-field tolerances using JSONPath-like syntax:

@verify_snapshot(
    ignore_fields=["$.meta.timestamp"],
    abs_tol_fields={"$.metrics.latency": 0.5},
    rel_tol_fields={"$.metrics.success_rate": 0.005}
)
def test_user_api_snapshot_field_tolerances():
    from myapp.api import fetch_user_data
    return fetch_user_data()

✅ Here:

Only metrics.latency allows ±0.5 difference
Only metrics.success_rate allows 0.5% relative variation

All other fields must match exactly

5.Complex JSON with Wildcards

Now picture a microservice returning a full system report:

{
  "services": [
    {"name": "auth", "uptime": 99.98, "latency": 210.5, "debug": "ok"},
    {"name": "billing", "uptime": 99.92, "latency": 315.7, "debug": "ok"}
  ],
  "meta": {"timestamp": "2025-10-25T11:00:00Z"}
}

You can mix ignore fields, wildcards, and numeric tolerances easily:

@verify_snapshot(
    ignore_fields=["$.services[*].debug", "$.meta.timestamp"],
    abs_tol_fields={"$.services[*].latency": 1.0},
    rel_tol_fields={"$.services[*].uptime": 0.01}
)
def test_service_health_report():
    return {
        "services": [
            {"name": "auth", "uptime": 99.97, "latency": 211.3, "debug": "ok"},
            {"name": "billing", "uptime": 99.90, "latency": 314.9, "debug": "ok"},
        ],
        "meta": {"timestamp": "2025-10-25T11:30:00Z"},
    }

✅ Wildcards ([*]) apply tolerance rules to every item in the list.

YAML Snapshot Testing

YAML files are everywhere — from CI pipelines and Helm charts to deployment manifests.

They’re also prone to drift: values change slightly, orders shift, and formatting differences cause false positives.

1.Simple Example — Kubernetes Deployment Snapshot

Here’s a basic test for a Kubernetes deployment YAML:

from pytest_verify import verify_snapshot

@verify_snapshot(ignore_order_yaml=True)
def test_kubernetes_deployment_yaml():
    return """
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: user-service
    spec:
      replicas: 3
      template:
        metadata:
          labels:
            app: user-service
        spec:
          containers:
            - name: user-service
              image: registry.local/user-service:v1.2
              ports:
                - containerPort: 8080

✅ This saves the deployment structure as a .expected.yaml snapshot.
On future runs, it automatically detects:

if you changed the number of replicas,
switched the container image,
or modified any key fields.

✅ The flag ignore_order_yaml=True makes it order-insensitive,
so switching the order of YAML keys or list items won’t trigger false diffs.

2.CI/CD Pipeline Config example with Tolerances and Ignores

Now let’s test something closer to a real DevOps setup, like a CI pipeline YAML.

Imagine your CI config (.gitlab-ci.yml) evolves frequently:

stages:
  - build
  - test
  - deploy

variables:
  TIMEOUT: 60
  RETRIES: 3

build:
  stage: build
  script:
    - docker build -t myapp:${CI_COMMIT_TAG:-latest} .
  tags: ["docker"]

test:
  stage: test
  script:
    - pytest --maxfail=1 --disable-warnings -q
  allow_failure: false

deploy:
  stage: deploy
  script:
    - ./scripts/deploy.sh
  environment: production
  when: manual

We can snapshot this configuration, allowing minor numeric drift (like timeouts or retry limits changing slightly) and ignoring volatile fields (like tags or environment metadata).

@verify_snapshot(
    ignore_order_yaml=True,
    ignore_fields=["$.variables.RETRIES", "$.deploy.environment"],
    abs_tol_fields={"$.variables.TIMEOUT": 5},
)
def test_cicd_pipeline_yaml_snapshot():
    return """
    stages:
      - build
      - test
      - deploy

    variables:
      TIMEOUT: 62
      RETRIES: 3

    build:
      stage: build
      script:
        - docker build -t myapp:${CI_COMMIT_TAG:-latest} .
      tags: ["docker"]

    test:
      stage: test
      script:
        - pytest --maxfail=1 --disable-warnings -q
      allow_failure: false

    deploy:
      stage: deploy
      script:
        - ./scripts/deploy.sh
      environment: staging
      when: manual
    """

✅ Here’s what happens:

ignore_order_yaml=True — key order won’t break the snapshot
ignore_fields=["$.variables.RETRIES", "$.deploy.environment"]
ignores expected environment differences abs_tol_fields={"$.variables.TIMEOUT": 5} — allows ±5 seconds difference for timeout settings

This is exactly what you want when managing evolving CI/CD configs or Helm charts — detect real changes, but ignore noise.

XML Snapshot Testing

1.Simple Example — Invoice Report

Here’s a basic XML test that verifies invoice data:

from pytest_verify import verify_snapshot

@verify_snapshot()
def test_invoice_xml_snapshot():
    return """
    <Invoices>
        <Invoice id="INV-001">
            <Customer>EDF</Customer>
            <Total>4590.25</Total>
            <Date>2025-10-25</Date>
        </Invoice>
        <Invoice id="INV-002">
            <Customer>Cegos</Customer>
            <Total>3120.10</Total>
            <Date>2025-10-25</Date>
        </Invoice>
    </Invoices>

✅ On first run, this saves a .expected.xml snapshot under snapshots/.

On the next run, pytest-verify will:

Parse both XML documents structurally.
Compare tags, attributes, and values.
Show a clear diff if anything changes.

Now imagine the system recalculates taxes overnight:

<Invoices>
    <Invoice id="INV-001">
        <Customer>EDF</Customer>
        <Total>4590.75</Total>
        <Date>2025-10-26</Date>
    </Invoice>
    <Invoice id="INV-002">
        <Customer>Cegos</Customer>
        <Total>3120.15</Total>
        <Date>2025-10-26</Date>
    </Invoice>
    <GeneratedAt>2025-10-26T08:30:00Z</GeneratedAt>
</Invoices>

Different totals (by a few cents) and a new generation timestamp?
Let’s not fail the test for that.

@verify_snapshot(
    ignore_fields=["//GeneratedAt", "//Invoice/Date"],
    abs_tol_fields={"//Invoice/Total": 0.5}
)
def test_invoice_xml_with_tolerance():
    return """
    <Invoices>
        <Invoice id="INV-001">
            <Customer>EDF</Customer>
            <Total>4590.75</Total>
            <Date>2025-10-26</Date>
        </Invoice>
        <Invoice id="INV-002">
            <Customer>Cegos</Customer>
            <Total>3120.15</Total>
            <Date>2025-10-26</Date>
        </Invoice>
        <GeneratedAt>2025-10-26T08:30:00Z</GeneratedAt>
    </Invoices>
    """

✅ Here’s what this test does:

ignore_fields=["//GeneratedAt", "//Invoice/Date"]
→ Ignores date/time fields that change daily.

abs_tol_fields={"//Invoice/Total": 0.5}
→ Allows a small numeric drift (±0.5) on totals — perfect for rounding or currency conversions.

Even if you add new invoices or minor numeric updates, the test stays stable and shows a clean, colorized diff for real structure or data changes.

2.Advanced Example — Mixed Tolerances & Wildcards

Here’s how it looks for something larger, like a shipment report:

@verify_snapshot(
    ignore_fields=[
        "//ReportGeneratedAt",
        "/Shipments/*/TrackingID"
    ],
    abs_tol_fields={
        "/Shipments/*/Weight": 0.1
    },
    rel_tol_fields={
        "/Shipments/*/Cost": 0.02
    }
)
def test_shipment_xml_report():
    return """
    <ShipmentsReport>
        <ReportGeneratedAt>2025-10-26T08:30:00Z</ReportGeneratedAt>
        <Shipments>
            <Shipment id="SHP-001">
                <TrackingID>XYZ123</TrackingID>
                <Weight>12.45</Weight>
                <Cost>52.00</Cost>
            </Shipment>
            <Shipment id="SHP-002">
                <TrackingID>ABC987</TrackingID>
                <Weight>8.10</Weight>
                <Cost>39.90</Cost>
            </Shipment>
        </Shipments>
    </ShipmentsReport>
    """

    """

✅ Explanation:

//ReportGeneratedAt → recursive ignore for global timestamps
/Shipments/*/TrackingID → wildcard ignore for all elements under any shipment
/Shipments/*/Weight → absolute tolerance (±0.1) for weight variations
/Shipments/*/Cost → relative tolerance (±2%) for cost differences

💡 Perfect for ERP exports, financial feeds, or shipment data where minor numeric or date drifts are normal, but structure or logical changes must be caught.

DataFrame Snapshot Testing

When validating transformations or ETL jobs, comparing large datasets by hand is painful.

Snapshot testing lets you lock in expected data outputs — and automatically detect meaningful changes later.

With pytest-verify, you can snapshot entire pandas.DataFrames and compare them structurally and numerically, with support for:

Ignored columns,
Absolute and relative tolerances,
CSV-based diff storage for readability.

Simple Example — Aggregated Sales Report

Let’s say you have a pipeline that aggregates daily sales:

import pandas as pd
from pytest_verify import verify_snapshot

@verify_snapshot()
def test_sales_dataframe_snapshot():
    data = {
        "region": ["North", "South", "West"],
        "total_sales": [1025.0, 980.0, 1100.5],
        "transactions": [45, 40, 52],
    }
    return pd.DataFrame(data)

✅ On first run, it will create a baseline:

__snapshots__/test_sales_dataframe_snapshot.expected.csv

On the next run, it will:

Compare the same DataFrame’s numeric and textual columns,

Show a readable diff if anything changes.

Now Imagine Minor Numeric Drift

Your ETL job reruns with slightly different rounding:

data = {
    "region": ["North", "South", "West"],
    "total_sales": [1025.3, 979.8, 1100.7],
    "transactions": [45, 40, 52],
}

Without tolerance, this would fail — but those changes are meaningless.
Let’s fix that:

@verify_snapshot(
    ignore_columns=["last_updated"],
    abs_tol=0.5,
    rel_tol=0.02
)
def test_etl_dataframe_with_tolerance():
    # Imagine this is the output of a real ETL job
    data = {
        "region": ["North", "South", "West"],
        "total_sales": [1025.3, 979.8, 1100.7],
        "transactions": [45, 40, 52],
        "last_updated": ["2025-10-25T10:30:00Z"] * 3,
    }
    return pd.DataFrame(data)

✅ What’s happening here:

ignore_columns=["last_updated"] → dynamic timestamps are ignored.
abs_tol=0.5 → numeric values can differ by ±0.5.
rel_tol=0.02 → also allows a 2% proportional drift (good for scaled data).

NumPy Snapshot Testing

Machine learning and scientific computations rarely produce exactly the same floats across environments or library versions.

Snapshot testing with tolerance control lets you verify your numeric logic without being too strict about minor floating-point differences.

Let’s say your model predicts normalized probabilities:

import numpy as np
from pytest_verify import verify_snapshot

@verify_snapshot()
def test_numpy_array_snapshot():
    # Output of a model or a simulation
    return np.array([0.12345, 0.45678, 0.41977])

✅ This creates a .expected.json snapshot with the array serialized to a list:

Now imagine your model runs on another machine (different BLAS/LAPACK lib), and the new output is:

np.array([0.1235, 0.4567, 0.4198])

Mathematically the same — but your tests fail.
Let's fix that:

@verify_snapshot(abs_tol=1e-3, rel_tol=1e-3)
def test_numpy_with_tolerance():
    # Example: predictions from a stochastic model
    return np.array([0.1235, 0.4567, 0.4198])

✅ Explanation:

abs_tol=1e-3 allows absolute drift of 0.001
rel_tol=1e-3 allows small relative variations (e.g., 0.1% change on large values)

This means any tiny numeric jitter is ignored,
while larger drifts (like 0.01 or 1%) still fail and trigger a diff.

Pydantic & Dataclasses Snapshot Testing

When testing business logic, it’s common to work with structured models — like API responses defined with Pydantic or internal objects using dataclasses.

pytest-verify handles both natively:

Automatically detects BaseModel or @dataclass types
Serializes them to JSON
Compares snapshots with full support for ignored fields and tolerances.

1.Testing a Pydantic API Response:

Let’s say you have a model describing a user profile:

from pydantic import BaseModel
from pytest_verify import verify_snapshot

class User(BaseModel):
    id: int
    name: str
    country: str
    last_login: str
    score: float

@verify_snapshot(ignore_fields=["id", "last_login"])
def test_pydantic_user_snapshot():
    """Ensure the API response remains stable except dynamic fields."""
    return User(
        id=101,
        name="Ayoub",
        country="France",
        last_login="2025-10-25T14:23:00Z",
        score=98.42
    )

✅ On the first run, you’ll get:

__snapshots__/test_pydantic_user_snapshot.expected.json

Then, if the API changes id or timestamp → ignored.

2.Using Dataclasses for Business Logic

If you use dataclasses for domain models or DTOs:

from dataclasses import dataclass
from pytest_verify import verify_snapshot

@dataclass
class Order:
    order_id: int
    customer: str
    total: float
    updated_at: str

@verify_snapshot(ignore_fields=["updated_at"])
def test_dataclass_order_snapshot():
    """Validate order structure stays stable."""
    return Order(order_id=1234, customer="Mohamed", total=249.99, updated_at="2025-10-25T12:00:00Z")

✅ On first run → baseline created.
If you later change field names or structure → the diff will highlight the mismatch.

3.Adding Field-Level Tolerances

@verify_snapshot(
    abs_tol_fields={"$.total": 0.5},  # allow ±0.5 on total
    ignore_fields=["$.updated_at"]
)
def test_dataclass_order_tolerance():
    return Order(order_id=1234, customer="Mohamed", total=250.20, updated_at="2025-10-25T12:05:00Z")

Wrapping Up — Snapshot Testing, Evolved

Traditional tests assert values.
Snapshot tests assert intent — they capture what your output should look like, and let you evolve confidently.

With pytest-verify, you can snapshot everything that matters:

✅ JSON & YAML — configs, APIs, and structured data
🧩 XML — ERP feeds, reports, and system exports
📊 DataFrames — ETL jobs and analytics pipelines
🔢 NumPy arrays — ML results and scientific computations
🧱 Pydantic & Dataclasses — stable schemas and domain models
✍️ Text or Binary — templates, logs, or compiled assets

Every snapshot is reproducible, human-readable, and version-controlled.

When something changes, you see exactly what and where — no more blind “assert equality” blocks.

💡 Final Thoughts

If you’ve ever run into this question:

“Did this change actually break something or just shift a float?”

Then pytest-verify is your new best friend.

It brings clarity and precision — one snapshot at a time.

if you find pytest-verify useful, give it a ⭐ on GitHub and share feedback!

Check part-2: Snapshot Testing in Python with pytest-verify — Part 2: Async Support

DEV Community