DEV Community

Cover image for How to mock OpenAI and Anthropic API calls in pytest (without monkeypatching)
autopost
autopost

Posted on

How to mock OpenAI and Anthropic API calls in pytest (without monkeypatching)

If you're building with LLM APIs, you've probably hit this problem: your test suite makes real API calls.

That means:

  • Every CI run costs money
  • Tests are slow (1–3 seconds per call)
  • Tests are flaky (LLM outputs aren't deterministic)
  • Tests fail without an API key

The usual fix is monkeypatching — replacing the client with a fake. But that means maintaining fake responses by hand, and your tests stop reflecting what the real API actually returns.

There's a better approach: record real responses once, replay them forever.

The record/replay pattern

Instead of faking the API, you intercept it:

  • Record mode — runs against the real API once, saves the response to a JSON file
  • Replay mode — returns the saved response instantly, no network call

The fixture file gets committed to git. CI runs offline. Tests are deterministic and free.

llm-mock

https://github.com/autopost/llm-mock is a pytest plugin that does exactly this for the Anthropic and OpenAI SDKs. It intercepts at the HTTP layer — your production code is never touched.

pip install llm-mock

Example: testing an Anthropic pipeline

Say you have this production code:

 # my_app/pipeline.py
  import anthropic                                                                                           

  client = anthropic.Anthropic()                                                                             

  def summarize(text: str) -> str:                                                                           
      message = client.messages.create(
          model="claude-sonnet-4-6",
          max_tokens=100,                                                                                    
          messages=[{"role": "user", "content": f"Summarize:{text}"}],
      )                                                                                                      
      return message.content[0].text
Enter fullscreen mode Exit fullscreen mode

Step 1 — Record once

Run this locally with your API key:

 from llm_mock import llm_mock
  from my_app.pipeline import summarize                                                                      

  with llm_mock(mode="record", fixture="tests/fixtures/summarize"):                                          
      result = summarize("Long article about climate change...")
      print(result)
Enter fullscreen mode Exit fullscreen mode

ANTHROPIC_API_KEY=sk-... python record_fixtures.py

This creates tests/fixtures/summarize.json. Commit it to git.

Step 2 — Replay in tests

# tests/test_pipeline.py
  import pytest
  from my_app.pipeline import summarize

  @pytest.mark.llm_replay(fixture="summarize")                                                               
  def test_summarize():
      result = summarize("Long article about climate change...")                                             
      assert "climate" in result
Enter fullscreen mode Exit fullscreen mode

pytest # no API key needed, instant, deterministic

That's it. The decorator intercepts the httpx call the Anthropic SDK makes internally and returns the saved response.

Auto mode — the easiest workflow

If you don't want to think about record vs replay at all, use mode="auto":

@pytest.mark.llm_replay(fixture="summarize", mode="auto")
  def test_summarize():                                                                                      
      result = summarize("Long article about climate change...")
      assert "climate" in result 
Enter fullscreen mode Exit fullscreen mode
  • Fixture exists → replays it
  • Fixture missing → records it automatically

First run needs an API key. Every run after that is free and offline.

Refreshing fixtures

When you change a prompt or update a model, refresh fixtures without touching test code:

LLM_MOCK_DISABLED=1 ANTHROPIC_API_KEY=sk-... pytest

LLM_MOCK_DISABLED=1 bypasses llm-mock entirely — all tests hit the real API and save fresh responses.

What the fixture looks like

Plain JSON, human-readable, diff-friendly in PRs:

{               
    "version": "1.0",                                                                                        
    "provider": "anthropic",
    "interactions": [
      {
        "hash": "a3f2c1...",
        "request": {                                                                                         
          "model": "claude-sonnet-4-6",                                                                      
          "messages": [{"role": "user", "content": "Summarize: Long article..."}],                           
          "max_tokens": 100                                                                                  
        },        
        "response": {                                                                                        
          "content": [{"type": "text", "text": "The article covers..."}],
          "stop_reason": "end_turn"                                                                          
        },        
        "recorded_at": "2026-06-01T10:00:00+00:00"                                                           
      }                                                                                                      
    ]
  }
Enter fullscreen mode Exit fullscreen mode

Requests are matched by SHA256 of (model, messages, temperature) — same request always hits the same fixture entry.

Works with OpenAI too

 @pytest.mark.llm_replay(fixture="gpt_summary", mode="auto")
  def test_openai_summary():                                                                                 
      result = my_openai_pipeline("Summarize this...")                                                       
      assert len(result) > 0 
Enter fullscreen mode Exit fullscreen mode

Both providers in one fixture file if you use provider="all" (the default).

vs other approaches

Approach Problem
unittest.mock / monkeypatch Fake responses drift from real API behavior
VCR.py Records raw HTTP — doesn't understand LLM request semantics
Always hit real API Expensive, slow, flaky, needs credentials in CI
llm-mock Record once, replay forever, fixtures in git

Install

pip install llm-mock

Repo and full docs: https://github.com/autopost/llm-mock

Top comments (0)