Mukunda Rao Katta

Posted on May 25

agenttap: see exactly what your Python LLM client sends over the wire

#hermeschallenge #ai #python #agents

I had cache_control set in the code. The docs said it would work. The response usage block kept showing zero cache read tokens, every single call.

I checked the system prompt. I checked the message list. I checked the model parameter. Everything looked right. The SDK was not throwing. The provider was not returning an error. Cache hits just never appeared.

My first guess was that the prompt was too short to qualify. My second guess was a region issue. Both were wrong. The real answer took about two hours to find: the SDK version I was pinned to was silently stripping cache_control before serializing the request body. The field existed in my Python dict. It never made it to the wire.

The only way I confirmed that was by dropping into the SDK internals and reading the serialization code. What I wanted was a way to see the exact JSON that left the process, without reading SDK source code.

That is what agenttap does.

The shape of the fix

agenttap plugs in as an httpx transport. You pass it to the SDK client. Every outbound call flows through it. The raw request body is captured before it leaves the process. Credentials are redacted automatically. Nothing else changes.

import httpx
import anthropic
from agenttap import Tap

tap = Tap()
client = anthropic.Anthropic(
    http_client=httpx.Client(transport=tap.transport())
)

client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=200,
    system=[{
        "type": "text",
        "text": "You are a helpful assistant.",
        "cache_control": {"type": "ephemeral"},
    }],
    messages=[{"role": "user", "content": "Hello"}],
)

import json
body = json.loads(tap.last.request_body)
system_block = body["system"][0]
print(system_block.get("cache_control"))  # None? Then the SDK dropped it.

If cache_control is missing from that output, you have your answer. You did not have to read the SDK internals. You did not have to add print statements around SDK internals you do not control. You looked at the wire.

To log every call to a file:

from agenttap import Tap

tap = Tap(sink="calls.jsonl")
client = anthropic.Anthropic(
    http_client=httpx.Client(transport=tap.transport())
)

Each request and response gets appended to calls.jsonl as a JSON line. You can open it in any editor or pipe it through jq.

To diff two calls:

from agenttap import diff

result = diff(tap.all[0], tap.all[1])
print(result)
# - "cache_control": {"type": "ephemeral"}
# + (missing)

That single diff output would have ended my two-hour session in five minutes.

What it does NOT do

It does not modify or repair requests. It shows what was sent, nothing more.
It does not normalize across providers. The output is the provider-specific JSON, verbatim.
It does not persist across process restarts. tap.all is in-process memory. Use sink= to write to disk if you need a record.
It is not a proxy or sidecar. There is nothing to deploy or configure outside your Python process.

Inside the lib: transport, not monkey-patch

Most approaches to capturing LLM requests fall into two categories: monkey-patching SDK methods, or adding verbose debug logging.

Monkey-patching works until the SDK changes the method you patched. It also tends to break type checking and autocomplete. And it requires you to know which internal method to patch for each SDK, which means reading SDK source anyway.

Debug logging captures something, but usually not the exact serialized body. SDKs often log at a higher level, before final serialization. They also log headers with credentials intact by default.

agenttap uses a third approach: the httpx transport layer. The Anthropic Python SDK and the OpenAI Python SDK both accept an http_client parameter. That client accepts a transport parameter. The transport sees the final, serialized httpx.Request object, which contains the exact bytes that would go over the network.

Plugging in at the transport layer means:

No SDK methods are patched. The SDK continues to work exactly as it did before.
The capture happens after all SDK serialization is complete. You see exactly what the provider receives.
It works for any SDK that uses httpx, not just Anthropic and OpenAI.
Removing agenttap is one line: stop passing the transport.

The credential redaction happens at the transport layer too. Headers are scrubbed by name. Request body string values are scanned against known credential patterns. By default, Authorization, x-api-key, anthropic-api-key, and similar headers come through as [REDACTED]. You can paste the output in a Slack thread without reviewing it first.

from agenttap import Tap, Redactor

# Default: scrub headers and known credential patterns in body
tap = Tap()

# Custom placeholder
tap = Tap(redactor=Redactor(placeholder="<removed>"))

# Disable redaction for local testing only
tap = Tap(redactor=Redactor.none())

When this is useful

You are getting a 400 from an LLM provider, the message looks fine on the application side, and you need to see what actually hit the API. agenttap answers that question in one extra line of setup code.

You are upgrading an SDK version and want to confirm the request shape stayed the same. Tap a call before and after, diff the outputs.

You are debugging a caching issue. cache_control is set in your code. Cache hits are not showing up in usage metadata. Tap the call and check whether cache_control made it through serialization.

You are writing a bug report or asking a teammate for help. You need to paste a real request body. Tap gives you clean, redacted JSON you can paste without scrubbing keys manually.

You are reviewing a PR that changes how requests are built. Tap the before and after, diff them, and you have a concrete artifact to attach to the review.

When NOT to use this

agenttap is a debug-loop tool. It is not built for production observability.

If you want dashboards, latency histograms, or long-term trace storage, use Phoenix, Langfuse, Helicone, or an OpenTelemetry collector. Those tools are built for production. agenttap is built for the moment you are staring at a 400 and need to know what you sent.

agenttap also only works with SDKs that use httpx. If your SDK uses requests or a custom HTTP layer, the transport hook does not apply.

Install

pip install agenttap

Zero dependencies. Python 3.9 and up. Tests included.

Repo: MukundaKatta/agenttap

Siblings

These libraries are from the same agent-stack series. They work at different boundaries.

Lib	Boundary	Repo
agentsnap	High-level tool-call snapshots, different granularity from wire	MukundaKatta/agentsnap
agenttrace	Cost and latency, uses response metadata not raw wire	MukundaKatta/agenttrace
prompt-replay	Captures and replays sessions	MukundaKatta/prompt-replay
cachebench	Cache hit and miss observability	MukundaKatta/cachebench

agenttap sits at the lowest level. It shows you the wire. agentsnap works above the wire at the tool-call level. agenttrace works above the wire using response metadata. prompt-replay captures sessions for replay. cachebench measures cache performance using the usage fields that come back in the response.

They compose. Tap a call in development to confirm the shape. Snap it in CI to catch regressions. Trace it in production for cost and latency.

What's next

The next feature I want is a tap.assert_wire(path) helper: run a call, compare the captured body against a saved JSON fixture, and raise if they differ. That makes wire-shape regression testing a one-liner in a test file. No more "I thought I was sending X but the SDK changed Y" surprises slipping past CI.

If you have ever spent an afternoon chasing a bug that turned out to be an SDK serialization detail, you already know why this tool exists.

DEV Community