Mukunda Rao Katta

Posted on May 21

The prompt your SDK sends is not the prompt you wrote

#hermesagent #ai #llm #python

A reply from Claude came back nonsense. The system prompt looked fine in my code. The messages looked fine in my logs. So I added a print(messages) right before client.messages.create(...). Still fine.

I was looking in the wrong place. The SDK was building the request body. What hit the wire was not what I was printing.

So I wrote a httpx transport that intercepts the outbound request, dumps the actual JSON, and lets me diff what I think I sent against what I actually sent. I called it agenttap.

The thing I missed

Here is the captured request for a call I thought was a clean two-turn conversation:

{
  "model": "claude-opus-4-7",
  "max_tokens": 1024,
  "system": "You are a careful code reviewer.",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Review this diff:\n\n```

diff\n+ foo\n

```"}
      ]
    },
    {
      "role": "assistant",
      "content": "Looks fine to me."
    },
    {
      "role": "user",
      "content": "What about edge cases?"
    }
  ],
  "metadata": {"user_id": "u_8821"}
}

Three things I did not write:

The first user message got wrapped in a [{"type": "text", "text": ...}] block. My code passed a plain string. The SDK normalized it.
The triple-backtick code block in the diff was preserved literally, including the language tag. I had a helper that was supposed to strip those.
The metadata.user_id was leaking from a default I set in the client constructor six commits ago and forgot.

None of these would have shown up in logs of my own variables. They only show up at the wire.

What agenttap does

pip install agenttap. Then:

import httpx
from agenttap import TapTransport
from anthropic import Anthropic

tap = TapTransport(wrap=httpx.HTTPTransport())
client = Anthropic(http_client=httpx.Client(transport=tap))

client.messages.create(
    model="claude-opus-4-7",
    max_tokens=256,
    messages=[{"role": "user", "content": "hi"}],
)

for record in tap.records:
    print(record.request_json)
    print(record.response_status, record.duration_ms, "ms")

The transport sits underneath the SDK. It does not know or care that this is Anthropic. It works the same way with OpenAI's Python SDK, with the Google client, with anything that ends up calling httpx.

It captures four things per call:

The full request URL and headers (with authorization and x-api-key redacted to ***).
The exact request body bytes (decoded as JSON if possible).
The response status, duration in milliseconds, and a copy of the response body.
A monotonic sequence number so you can sort across coroutines.

Replay and diff

The reason I built this was not just to look. I wanted to replay.

from agenttap import replay

# Pin a captured request as a fixture
tap.save("fixtures/review_call.json")

# Later, replay the exact bytes
resp = replay("fixtures/review_call.json", api_key=os.environ["ANTHROPIC_API_KEY"])

And diff two recordings:

from agenttap import diff_records

a = tap.records[0]
b = tap.records[1]
print(diff_records(a, b))
# - messages[0].content[0].text: "Review this diff:..."
# + messages[0].content[0].text: "Review this PR:..."
# - metadata.user_id: "u_8821"
# + metadata.user_id: "u_4410"

That diff caught a regression last week where a prompt template change added a stray newline. The output still looked plausible, but the deterministic eval drifted. The wire diff showed me the exact byte.

Numbers

Overhead per call when capturing in memory: about 0.4 ms on my laptop for a 2 KB body. The transport buffers the response body so streaming is slightly different. If you want to keep streaming responses streaming, pass capture_response_body=False and only the request side is recorded.

Default ring buffer holds the last 500 records. You can flush to disk with tap.save_all(dir="taps/") and rotate.

What this does not solve

A few honest limits.

It only sees the wire. If your SDK retries internally and you only configured one logger, you get every retry. That can be noisy.
It does not redact PII out of the message body. Only credentials in headers. If you are sending user data, you still need a redaction pass before persisting.
Streaming responses are captured as the joined output, not the SSE event stream. If you need event-level traces, use claude-stream-rs or wire a separate handler.
The redaction list is conservative. If your provider uses a custom auth header name, you need to add it: TapTransport(redact_headers=["x-my-auth"]).

If you are running agents in production and you have never looked at the literal JSON that left your process, do it once. You will find something.

Repo: https://github.com/MukundaKatta/agenttap
PyPI: pip install agenttap

This is one of a small set of focused libraries I publish for AI agent plumbing (snapshots, budgets, drift, repair). Built piece by piece from real incidents.

DEV Community