There's a specific kind of frustration that every backend developer knows.
You're writing a test for a POST /orders endpoint. You open the code, look at the request schema, and start typing:
def test_post_orders():
resp = client.post("/orders", json={
"user_id": "user_123", # guessing
"items": [...], # guessing
"coupon": None, # guessing
})
assert resp.status_code == 201
You're not testing your API. You're testing your assumptions about your API.
And the worst part? You won't find out your assumptions were wrong until a user hits an edge case you never thought of.
The real problem with manually written integration tests
I've been thinking about this for a while. The core issue isn't that writing tests is tedious — it's that we're working from the wrong source of truth.
When you write tests manually, your source of truth is:
- Your own reading of the code
- Your mental model of how the endpoint should behave
- Maybe some Postman requests you remember
But the actual source of truth is your production traffic. Real users, real payloads, real edge cases. That data already exists. You're just not using it.
A user sends {"items": [], "coupon": "EXPIRED2022"} and gets a 422. That's a test case. You didn't write it. They did.
The insight: production traffic is the perfect test oracle
About six months ago I started thinking: what if I just... captured the traffic?
If I intercept every inbound HTTP request and its response — method, path, headers, body, status code — I have everything I need to generate a test. Not a guessed test. A test based on what actually happened.
The architecture turned out to be surprisingly simple:
Inbound request
↓
Middleware (samples X% of traffic)
↓
Sanitize PII (emails, tokens, card numbers)
↓
Upload to collector
↓
httrace generate → writes test files
What the middleware actually does
Here's the Python ASGI middleware (simplified):
class HttraceCaptureMiddleware:
def __init__(self, app, api_key, service, sample_rate=0.1):
self.app = app
self.api_key = api_key
self.service = service
self.sample_rate = sample_rate
async def __call__(self, scope, receive, send):
if scope["type"] != "http" or random.random() > self.sample_rate:
await self.app(scope, receive, send)
return
# Capture request body
req_body = await self._read_body(receive)
# Capture response
resp_chunks = []
status_code = 200
async def capture_send(message):
nonlocal status_code
if message["type"] == "http.response.start":
status_code = message["status"]
elif message["type"] == "http.response.body":
resp_chunks.append(message.get("body", b""))
await send(message)
start = time.monotonic()
await self.app(scope, receive_with_body(req_body), capture_send)
latency_ms = (time.monotonic() - start) * 1000
# Fire-and-forget upload
asyncio.create_task(self._upload(
scope, req_body, b"".join(resp_chunks), status_code, latency_ms
))
The sample_rate parameter is important. You don't want to capture 100% of traffic — that's unnecessary overhead and cost. 10% gives you statistically solid coverage within hours.
PII sanitization before anything leaves your server
This was non-negotiable. The middleware sanitizes before the upload, not after:
SENSITIVE_PATTERNS = [
(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '"[EMAIL]"'),
(r'\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b', '"[CARD]"'),
(r'"(?:password|token|secret|api_key)"\s*:\s*"[^"]*"',
lambda m: m.group(0).split(':')[0] + ': "[REDACTED]"'),
]
def sanitize(body: dict) -> dict:
text = json.dumps(body)
for pattern, replacement in SENSITIVE_PATTERNS:
text = re.sub(pattern, replacement, text)
return json.loads(text)
Nothing sensitive ever leaves your infrastructure. GDPR-compliant by design, not by policy.
What the generated test actually looks like
After capturing a few hours of traffic on a POST /orders endpoint, running httrace generate --format pytest produces something like this:
# tests/integration/test_post_orders.py
# Generated by httrace from 3 captured requests
import pytest
import httpx
BASE_URL = "http://localhost:8000"
@pytest.fixture
def auth_headers():
return {"Authorization": "Bearer [REDACTED]"}
def test_post_orders_201_standard(auth_headers):
"""POST /orders → 201 (captured from production traffic)"""
resp = httpx.post(
f"{BASE_URL}/orders",
json={
"user_id": "usr_7f2a",
"items": [{"sku": "SHOE-42", "qty": 2}],
"shipping_address": {
"city": "Berlin",
"zip": "10115"
}
},
headers=auth_headers,
)
assert resp.status_code == 201
assert "order_id" in resp.json()
assert isinstance(resp.json()["total_eur"], (int, float))
def test_post_orders_422_empty_items(auth_headers):
"""POST /orders → 422 (edge case from production traffic)"""
resp = httpx.post(
f"{BASE_URL}/orders",
json={"user_id": "usr_3b1c", "items": []},
headers=auth_headers,
)
assert resp.status_code == 422
def test_post_orders_422_expired_coupon(auth_headers):
"""POST /orders → 422 (edge case from production traffic)"""
resp = httpx.post(
f"{BASE_URL}/orders",
json={
"user_id": "usr_9d4e",
"items": [{"sku": "SHIRT-M", "qty": 1}],
"coupon": "SUMMER2022"
},
headers=auth_headers,
)
assert resp.status_code == 422
Notice the last two tests. Nobody wrote those. A real user sent {"items": []} and another sent an expired coupon. Both became tests.
The part I didn't expect: API drift detection
Once you have a baseline of captured traffic, you can do something else useful: detect when your API schema changes.
$ httrace diff --service my-api --fail-on-breaking
✗ /v1/orders response.items[].price: number → string (BREAKING)
✗ /v1/users email: required → optional (BREAKING)
✓ /v1/products GET added (new endpoint, non-breaking)
2 breaking changes detected.
exit code: 1
This runs in CI. If you rename a field or change a type, the pipeline fails before anything reaches production. No Pact. No contract definitions to write upfront. Just your real traffic as the baseline.
Setup (the actual one-liner part)
# FastAPI
from httrace import HttraceCaptureMiddleware
app.add_middleware(
HttraceCaptureMiddleware,
api_key="ht_...",
service="my-api",
sample_rate=0.1,
)
# Generate tests from captured traffic
httrace generate --format pytest
# Or Jest, Vitest, Go testing, RSpec
httrace generate --format jest
The SDKs are open-source (MIT) for Python, Node.js, Go and Ruby. The collector and test generator are the hosted part.
What I learned building this
The hardest part wasn't the middleware or the test generator. It was the assertion logic.
When you capture a response body like {"order_id": "ord_abc123", "total": 49.99, "items": [...], "created_at": "2024-01-15T..."}, you can't just assert the entire body — order_id will be different every run, created_at will change.
So the generator has to be smart about what to assert:
- Stable string fields (short, non-UUID): exact match
- UUIDs and IDs: assert presence + format, not value
- Numbers: exact match for prices, range check for timestamps
- Arrays: assert length + element schema, not exact content
- Timestamps: assert ISO format, not value
Getting that heuristic right took a while. There are still edge cases I'm improving.
The tool is called httrace. There's a free plan (50K requests/month, pytest + Jest). If you're curious: httrace.com
The SDKs are on GitHub: github.com/httrace-io/httrace
What's your approach to integration tests? Do you write them manually, skip them entirely, or use something else? I'm especially curious whether anyone has tried capturing traffic before — and what made it not work for your use case.

Top comments (0)