If you're building with LLM APIs, you've probably hit this problem: your test suite makes real API calls.
That means:
- Every CI run costs money
- Tests are slow (1–3 seconds per call)
- Tests are flaky (LLM outputs aren't deterministic)
- Tests fail without an API key
The usual fix is monkeypatching — replacing the client with a fake. But that means maintaining fake responses by hand, and your tests stop reflecting what the real API actually returns.
There's a better approach: record real responses once, replay them forever.
The record/replay pattern
Instead of faking the API, you intercept it:
- Record mode — runs against the real API once, saves the response to a JSON file
- Replay mode — returns the saved response instantly, no network call
The fixture file gets committed to git. CI runs offline. Tests are deterministic and free.
llm-mock
https://github.com/autopost/llm-mock is a pytest plugin that does exactly this for the Anthropic and OpenAI SDKs. It intercepts at the HTTP layer — your production code is never touched.
pip install llm-mock
Example: testing an Anthropic pipeline
Say you have this production code:
# my_app/pipeline.py
import anthropic
client = anthropic.Anthropic()
def summarize(text: str) -> str:
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=100,
messages=[{"role": "user", "content": f"Summarize:{text}"}],
)
return message.content[0].text
Step 1 — Record once
Run this locally with your API key:
from llm_mock import llm_mock
from my_app.pipeline import summarize
with llm_mock(mode="record", fixture="tests/fixtures/summarize"):
result = summarize("Long article about climate change...")
print(result)
ANTHROPIC_API_KEY=sk-... python record_fixtures.py
This creates tests/fixtures/summarize.json. Commit it to git.
Step 2 — Replay in tests
# tests/test_pipeline.py
import pytest
from my_app.pipeline import summarize
@pytest.mark.llm_replay(fixture="summarize")
def test_summarize():
result = summarize("Long article about climate change...")
assert "climate" in result
pytest # no API key needed, instant, deterministic
That's it. The decorator intercepts the httpx call the Anthropic SDK makes internally and returns the saved response.
Auto mode — the easiest workflow
If you don't want to think about record vs replay at all, use mode="auto":
@pytest.mark.llm_replay(fixture="summarize", mode="auto")
def test_summarize():
result = summarize("Long article about climate change...")
assert "climate" in result
- Fixture exists → replays it
- Fixture missing → records it automatically
First run needs an API key. Every run after that is free and offline.
Refreshing fixtures
When you change a prompt or update a model, refresh fixtures without touching test code:
LLM_MOCK_DISABLED=1 ANTHROPIC_API_KEY=sk-... pytest
LLM_MOCK_DISABLED=1 bypasses llm-mock entirely — all tests hit the real API and save fresh responses.
What the fixture looks like
Plain JSON, human-readable, diff-friendly in PRs:
{
"version": "1.0",
"provider": "anthropic",
"interactions": [
{
"hash": "a3f2c1...",
"request": {
"model": "claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Summarize: Long article..."}],
"max_tokens": 100
},
"response": {
"content": [{"type": "text", "text": "The article covers..."}],
"stop_reason": "end_turn"
},
"recorded_at": "2026-06-01T10:00:00+00:00"
}
]
}
Requests are matched by SHA256 of (model, messages, temperature) — same request always hits the same fixture entry.
Works with OpenAI too
@pytest.mark.llm_replay(fixture="gpt_summary", mode="auto")
def test_openai_summary():
result = my_openai_pipeline("Summarize this...")
assert len(result) > 0
Both providers in one fixture file if you use provider="all" (the default).
vs other approaches
| Approach | Problem |
|---|---|
| unittest.mock / monkeypatch | Fake responses drift from real API behavior |
| VCR.py | Records raw HTTP — doesn't understand LLM request semantics |
| Always hit real API | Expensive, slow, flaky, needs credentials in CI |
| llm-mock | Record once, replay forever, fixtures in git |
Install
pip install llm-mock
Repo and full docs: https://github.com/autopost/llm-mock

Top comments (0)