I lost an entire afternoon last month to a bug that had a one-word explanation: the SDK swapped system for system_prompt between two minor versions, and my retry path was building the message dict the old way.
The agent looked fine. The traces looked fine. The error message from Anthropic was a generic 400. I added log lines. I added more log lines. I printed the message dict. The dict looked right. The dict was right. The SDK was reshaping it on the way out.
The only thing that would have caught this in five minutes is what the provider actually received on the wire. That is what agenttap does.
The problem
Five years into the SDK era, "what was actually sent to the model?" is still a hard question.
SDK debug logging is verbose, leaks API keys into your terminal scrollback, and reformats payloads in transit. Callbacks scatter across vendor-specific abstractions and never agree on what a "request" is. The two-line solution everyone reaches for, httpx.Client(event_hooks=...), gives you a Request object but does not redact your Authorization header.
You end up reading the SDK source to figure out where the request body gets serialized. Then you patch a method. Then you forget you patched the method. Then a new SDK version changes the method name and your debug glue silently breaks.
The shape of the fix
agenttap installs as an httpx transport. You hand it to the client. Every call goes through it. Credentials get scrubbed on the way in. The exact request body sticks around in memory so you can inspect it after the call.
import httpx
import anthropic
from agenttap import Tap
tap = Tap()
client = anthropic.Anthropic(http_client=httpx.Client(transport=tap.transport()))
client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=200,
messages=[{"role": "user", "content": "Hello"}],
)
print(tap.last.url) # https://api.anthropic.com/v1/messages
print(tap.last.pretty_request()) # exact JSON body sent
print(tap.last.response_status) # 200
OpenAI works the same way:
import httpx
import openai
from agenttap import Tap
tap = Tap()
client = openai.OpenAI(http_client=httpx.Client(transport=tap.transport()))
client.chat.completions.create(
model="gpt-4o", messages=[{"role": "user", "content": "Hi"}]
)
print(tap.last.request_body)
When you want to know why two calls produced different results, diff them:
from agenttap import diff
print(diff(tap.all[0], tap.all[1]))
# - "system": "v1: be helpful"
# + "system": "v2: be concise"
That diff call would have saved me my lost afternoon.
What it does NOT do
- It is not a proxy. It is not a server. There is nothing to deploy.
- It does not normalize across providers. The whole point is to show what each provider actually received.
- It does not persist.
tap.alllives in memory. Write it to JSON yourself if you want a record. - It is not full observability. For traces and dashboards, ship the recorded calls into Phoenix, Langfuse, or OpenTelemetry.
Inside the lib (one design choice worth showing)
agenttap redacts in two places, and both matter.
Headers get scrubbed by name. The list is small and explicit: Authorization, x-api-key, api-key, cookie, anthropic-api-key, openai-organization, x-amz-security-token, x-google-api-key. This is the boring layer. Most credential leaks happen here, and a fixed list closes the door cleanly.
Body string values get scrubbed by regex against known credential shapes: OpenAI and Anthropic sk-…, AWS AKIA…, Google AIza…, Slack xox[baprs]-…. This is the layer that catches the credentials that should not be in your request body but somehow are, because some helper put them there.
from agenttap import Tap, Redactor
# Default: scrub headers + known credential patterns in body
tap = Tap()
# Opt out for local testing
tap = Tap(redactor=Redactor.none())
# Custom placeholder
tap = Tap(redactor=Redactor(placeholder="<hidden>"))
The reason this design matters: the first time you want to copy a tapped request into a Slack thread to ask a teammate for help, you do not have to do mental math about whether your key is in the screenshot. The default already removed it.
When this is useful
- You are debugging a "looks right, fails with 400" error from an LLM provider and need to see the exact bytes that hit the wire.
- You are migrating between SDK versions and want to confirm the request shape did not silently change.
- You are writing a Slack post or bug report and want to paste a real request body without leaking your key.
- You are comparing two prompt variants and need a diff at the wire level, not at the application level.
- You are reviewing a teammate's PR that touches request building and want a quick "what does this actually send?" sanity check.
When this is NOT what you want
- You need long-term observability with dashboards. Use Phoenix, Langfuse, Helicone, or an OTel collector. agenttap is for the debug loop, not the production dashboard.
- You need a proxy that fronts every call from every process. agenttap is in-process per client.
- You are on an SDK that does not use httpx under the hood. The transport hook does not apply.
Install
pip install agenttap
Repo: https://github.com/MukundaKatta/agenttap
Sibling libraries
| Library | Role |
|---|---|
| agentsnap | Snapshot tests for agent runs |
| cachebench | Per-call prompt-cache hit ratio + cost saved |
| llmfleet | Pool requests from many coroutines into provider Batch APIs |
| agentguard | Egress allowlist for tool calls |
| agenttrace | Cost + latency per run |
agenttap is the debug-loop tool. The others are production tools. They compose: tap a call in development, snap it in CI, trace it in production.
What's next
I want to add a small TapServer adapter that exports captured calls as OpenTelemetry spans, so the same calls you tap in development can flow into the same dashboard you use in production. I also want a "compare against a saved JSON" assertion so you can pin the wire shape in a test.
If you have ever lost an afternoon to a phantom SDK change, you already know why this exists.
Top comments (0)