You type one line into Claude Code and hit Enter. In the background, ~80 KB of JSON gets shipped to Anthropic. System prompts, tool definitions, your CLAUDE.md, the full conversation history — all of it, every single request.
The interesting part isn't the size. It's what's inside, why it's structured that way, and why the bill doesn't scale in proportion to it. I put a man-in-the-middle proxy between Claude Code and Anthropic's API to find out. This is the breakdown.
Setting Up the Interception
The trick is routing Claude Code's HTTPS traffic through mitmweb — a browser-based MITM proxy that decrypts requests in flight.
Start the proxy in one terminal:
mitmweb --listen-port 8080
Open http://localhost:8081 in a browser. That's the inspector UI.
In a second terminal, route Claude Code's traffic through it:
export HTTPS_PROXY=http://localhost:8080
export NODE_EXTRA_CA_CERTS=~/.mitmproxy/mitmproxy-ca-cert.pem
claude
Note: The NODE_EXTRA_CA_CERTS line is the part most people miss. Without it, Node's TLS layer rejects the proxy's self-signed cert and Claude Code silently fails.
Now ask something simple — "read package.json and tell me which test framework is used." The proxy lights up. Click into the POST /v1/messages request.
80 KB. For one sentence.
What's Inside the 80 KB
Raw JSON in a proxy inspector is unreadable. I built a small parser that decodes a Claude Code payload into clean sections — you can paste your own intercepted requests in here.
The payload breaks into four parts.
1. System prompt (cached server-side)
"You are Claude Code, Anthropic's official CLI tool. You are an interactive agent..."
This identity block ships on every request. It's deterministic, so it gets cached for 5 minutes.
2. Tool definitions (cached server-side)
Full JSON schemas for every tool — Bash, Edit, Grep, Glob, Read, Write, Agent, and more. Each schema defines parameters, types, when to use the tool, and — interestingly — when not to use it. The Bash tool spec, for example, explicitly tells the model what kinds of operations it should refuse.
This is heavy. It's the bulk of the static payload. It's also cached, so you pay for it once per 5-minute window.
3. CLAUDE.md (injected via message)
This is the surprise. Whatever you write in CLAUDE.md at your project root gets injected — word for word — into every API request as a system reminder.
That means if your CLAUDE.md is bloated with stale rules, every request carries that weight. Optimize it.
4. Conversation history
The messages array — your prompts, Claude's responses, every tool call, every tool result. This is the part that grows.
The Agent Loop
Here's the mental model that made everything click:
Claude Code is a while loop.
while not done:
payload = assemble(system_prompt, tools, claude_md, history)
response = api_call(payload)
if response.contains_tool_call:
result = execute_tool_locally(response.tool_call)
history.append(response)
history.append(result)
else:
done = True
There's no separate planner. The loop is the planner. Every iteration, the model decides: do I have enough to answer, or do I need another tool call? If it calls a tool, the result gets appended to history and the loop runs again.
This is why payloads grow over multi-step tasks. I gave Claude a harder job — "run the tests and fix any failing test." It ran Bash, saw a failure, called Read on the failing file, called Edit to patch it, ran Bash again to verify. Four iterations. Four API calls. Each one carrying the cumulative messages array of everything before it.
The proxy showed it clearly: 80 KB → 95 KB → 110 KB → 130 KB.
Why the Bill Doesn't Explode
Naive math says: payload doubles → cost doubles. That's not what happens, and it's the most underrated piece of Claude Code's design.
Prompt caching is the trick. Running cclogviewer on a session log shows the actual token economics:
Request 1: 8,023 tokens cache write + ~2,000 tokens fresh input
Request 2: ~300 tokens fresh + 12,203 tokens cache read
Request 3: ~250 tokens fresh + 12,500 tokens cache read
The numbers that matter:
- Cache write is ~25% more expensive than fresh input (one-time hit on the first request)
- Cache read is ~10x cheaper than fresh input
So even though every request resends the full system prompt + tool definitions + CLAUDE.md, after the first request those bytes are billed at a fraction of the rate. Only the new parts of the conversation get charged at full price.
Without caching, a 10-tool-call session would scale roughly quadratically in cost. With caching, it scales nearly linearly with the new content produced.
Replaying Sessions Locally
One more thing I didn't expect: Claude Code logs every session locally as JSONL at ~/.claude/projects/<project>/<session-id>.jsonl. Each line is one request or response.
Raw, it's unreadable. But the community built tools for it:
-
claude-replay— drag a JSONL file in, get a clean visual playback of the conversation, including every tool call and result. -
cclogviewer— terminal tool that gives you a token-by-token cost breakdown per request.
If you're debugging a session that went off the rails, or auditing what your team's Claude Code usage actually sent to Anthropic, these are the tools you want.
See It In Action
The proxy walkthrough, the parser breakdown, and the live agent loop land harder visually than in prose:
-
0:00— The 80 KB hook -
0:30— Setting up the MITM proxy -
2:30— Decoding the payload with the request parser -
4:00— The agent loop in action -
5:30— JSONL session logs and replay tools -
7:00— Token economics and prompt caching
Quick Takeaways
- Claude Code ships ~80 KB per request — most of it is tool schemas and your CLAUDE.md, not your prompt
- The agent itself is a while loop — every iteration, the model decides whether to call a tool or stop
- Payloads grow with every tool call because
messagesaccumulates - Prompt caching makes that growth ~10x cheaper after the first request
- Sessions are recorded locally as JSONL —
claude-replayandcclogviewermake them readable
If you're building AI coding tools, copying this architecture is a reasonable starting point. If you're just using Claude Code, optimizing your CLAUDE.md is the highest-leverage thing you can do — it ships on every request.
I'm a Senior Platform Engineer publishing unfiltered breakdowns of how AI coding tools actually work in production. Follow along on YouTube if that's your kind of thing.
🛠️ Tools mentioned:
- Request Parser — built for this video, free to use
- mitmproxy — the MITM proxy
- claude-replay — JSONL session replay
- cclogviewer — token-level cost breakdown
Top comments (0)