I stopped writing integration tests manually. Here's what I built instead.

#webdev #python #testing #showdev

There's a specific kind of frustration that every backend developer knows.

You're writing a test for a POST /orders endpoint. You open the code, look at the request schema, and start typing:

def test_post_orders():
    resp = client.post("/orders", json={
        "user_id": "user_123",   # guessing
        "items": [...],          # guessing
        "coupon": None,          # guessing
    })
    assert resp.status_code == 201

You're not testing your API. You're testing your assumptions about your API.

And the worst part? You won't find out your assumptions were wrong until a user hits an edge case you never thought of.

The real problem with manually written integration tests

I've been thinking about this for a while. The core issue isn't that writing tests is tedious — it's that we're working from the wrong source of truth.

When you write tests manually, your source of truth is:

Your own reading of the code
Your mental model of how the endpoint should behave
Maybe some Postman requests you remember

But the actual source of truth is your production traffic. Real users, real payloads, real edge cases. That data already exists. You're just not using it.

A user sends {"items": [], "coupon": "EXPIRED2022"} and gets a 422. That's a test case. You didn't write it. They did.

The insight: production traffic is the perfect test oracle

About six months ago I started thinking: what if I just... captured the traffic?

If I intercept every inbound HTTP request and its response — method, path, headers, body, status code — I have everything I need to generate a test. Not a guessed test. A test based on what actually happened.

The architecture turned out to be surprisingly simple:

Inbound request
      ↓
  Middleware (samples X% of traffic)
      ↓
  Sanitize PII (emails, tokens, card numbers)
      ↓
  Upload to collector
      ↓
  httrace generate → writes test files

What the middleware actually does

Here's the Python ASGI middleware (simplified):

class HttraceCaptureMiddleware:
    def __init__(self, app, api_key, service, sample_rate=0.1):
        self.app = app
        self.api_key = api_key
        self.service = service
        self.sample_rate = sample_rate

    async def __call__(self, scope, receive, send):
        if scope["type"] != "http" or random.random() > self.sample_rate:
            await self.app(scope, receive, send)
            return

        # Capture request body
        req_body = await self._read_body(receive)

        # Capture response
        resp_chunks = []
        status_code = 200

        async def capture_send(message):
            nonlocal status_code
            if message["type"] == "http.response.start":
                status_code = message["status"]
            elif message["type"] == "http.response.body":
                resp_chunks.append(message.get("body", b""))
            await send(message)

        start = time.monotonic()
        await self.app(scope, receive_with_body(req_body), capture_send)
        latency_ms = (time.monotonic() - start) * 1000

        # Fire-and-forget upload
        asyncio.create_task(self._upload(
            scope, req_body, b"".join(resp_chunks), status_code, latency_ms
        ))

The sample_rate parameter is important. You don't want to capture 100% of traffic — that's unnecessary overhead and cost. 10% gives you statistically solid coverage within hours.

PII sanitization before anything leaves your server

This was non-negotiable. The middleware sanitizes before the upload, not after:

SENSITIVE_PATTERNS = [
    (r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '"[EMAIL]"'),
    (r'\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b', '"[CARD]"'),
    (r'"(?:password|token|secret|api_key)"\s*:\s*"[^"]*"',
     lambda m: m.group(0).split(':')[0] + ': "[REDACTED]"'),
]

def sanitize(body: dict) -> dict:
    text = json.dumps(body)
    for pattern, replacement in SENSITIVE_PATTERNS:
        text = re.sub(pattern, replacement, text)
    return json.loads(text)

Nothing sensitive ever leaves your infrastructure. GDPR-compliant by design, not by policy.

What the generated test actually looks like

After capturing a few hours of traffic on a POST /orders endpoint, running httrace generate --format pytest produces something like this:

# tests/integration/test_post_orders.py
# Generated by httrace from 3 captured requests

import pytest
import httpx

BASE_URL = "http://localhost:8000"


@pytest.fixture
def auth_headers():
    return {"Authorization": "Bearer [REDACTED]"}


def test_post_orders_201_standard(auth_headers):
    """POST /orders → 201 (captured from production traffic)"""
    resp = httpx.post(
        f"{BASE_URL}/orders",
        json={
            "user_id": "usr_7f2a",
            "items": [{"sku": "SHOE-42", "qty": 2}],
            "shipping_address": {
                "city": "Berlin",
                "zip": "10115"
            }
        },
        headers=auth_headers,
    )
    assert resp.status_code == 201
    assert "order_id" in resp.json()
    assert isinstance(resp.json()["total_eur"], (int, float))


def test_post_orders_422_empty_items(auth_headers):
    """POST /orders → 422 (edge case from production traffic)"""
    resp = httpx.post(
        f"{BASE_URL}/orders",
        json={"user_id": "usr_3b1c", "items": []},
        headers=auth_headers,
    )
    assert resp.status_code == 422


def test_post_orders_422_expired_coupon(auth_headers):
    """POST /orders → 422 (edge case from production traffic)"""
    resp = httpx.post(
        f"{BASE_URL}/orders",
        json={
            "user_id": "usr_9d4e",
            "items": [{"sku": "SHIRT-M", "qty": 1}],
            "coupon": "SUMMER2022"
        },
        headers=auth_headers,
    )
    assert resp.status_code == 422

Notice the last two tests. Nobody wrote those. A real user sent {"items": []} and another sent an expired coupon. Both became tests.

The part I didn't expect: API drift detection

Once you have a baseline of captured traffic, you can do something else useful: detect when your API schema changes.

$ httrace diff --service my-api --fail-on-breaking

  ✗ /v1/orders  response.items[].price: number → string  (BREAKING)
  ✗ /v1/users   email: required → optional               (BREAKING)
  ✓ /v1/products GET added (new endpoint, non-breaking)

2 breaking changes detected.
exit code: 1

This runs in CI. If you rename a field or change a type, the pipeline fails before anything reaches production. No Pact. No contract definitions to write upfront. Just your real traffic as the baseline.

Setup (the actual one-liner part)

# FastAPI
from httrace import HttraceCaptureMiddleware

app.add_middleware(
    HttraceCaptureMiddleware,
    api_key="ht_...",
    service="my-api",
    sample_rate=0.1,
)

# Generate tests from captured traffic
httrace generate --format pytest

# Or Jest, Vitest, Go testing, RSpec
httrace generate --format jest

The SDKs are open-source (MIT) for Python, Node.js, Go and Ruby. The collector and test generator are the hosted part.

What I learned building this

The hardest part wasn't the middleware or the test generator. It was the assertion logic.

When you capture a response body like {"order_id": "ord_abc123", "total": 49.99, "items": [...], "created_at": "2024-01-15T..."}, you can't just assert the entire body — order_id will be different every run, created_at will change.

So the generator has to be smart about what to assert:

Stable string fields (short, non-UUID): exact match
UUIDs and IDs: assert presence + format, not value
Numbers: exact match for prices, range check for timestamps
Arrays: assert length + element schema, not exact content
Timestamps: assert ISO format, not value

Getting that heuristic right took a while. There are still edge cases I'm improving.

The tool is called httrace. There's a free plan (50K requests/month, pytest + Jest). If you're curious: httrace.com

The SDKs are on GitHub: github.com/httrace-io/httrace

What's your approach to integration tests? Do you write them manually, skip them entirely, or use something else? I'm especially curious whether anyone has tried capturing traffic before — and what made it not work for your use case.