강래민

Posted on Mar 19

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic

#ai #opensource #devtools #claude

Ever wondered what Claude Code is actually sending to the Anthropic API behind the scenes? I did — so I built Claude Inspector, a macOS desktop app that intercepts and visualizes Claude Code's HTTP traffic in real time via a local MITM proxy.

Claude Code CLI → Inspector (localhost:9090) → api.anthropic.com

All traffic stays local. Nothing is stored or shared.

How It Works

Open Claude Inspector and click Start Proxy
Run Claude Code with the proxy env var: ANTHROPIC_BASE_URL=http://localhost:9090 claude
Every API request and response is captured and visualized in real time

What I Found

1. Your CLAUDE.md is sent on every single request

Every request silently prepends your project CLAUDE.md, global rules, and memory files as a system-reminder block. The structure looks like this:

Available skills list (~2KB)
CLAUDE.md + rules + memory (~10KB)
Your actual message (whatever you typed)

That's ~12KB of overhead before you type a single word — and it repeats on every request.

2. MCP tools are lazy-loaded

Built-in tools (27 of them) ship with full JSON schemas on every request. MCP tools, however, start as name-only placeholders. Their schemas get injected dynamically only when the model actually requests them — so unused MCP tools don't cost you tokens.

3. Screenshots are expensive

Images are base64-encoded and embedded directly in the JSON body. A single screenshot can add hundreds of kilobytes to your request. That one "quick screenshot" costs way more than you'd expect.

4. Skills vs. Commands are handled very differently

Local Commands (/clear, /mcp)
→ Model only sees the result, never the command itself

Skills (/commit, Skill("finish"))
→ Full prompt text is injected into the conversation
→ That prompt persists for the rest of the session

Skills inject their entire prompt text into your conversation and it stays there for the rest of the session.

5. Full conversation history is resent every time

Every API request includes your entire conversation history from the beginning. The longer your session runs, the more you pay — linearly.

A 30-turn conversation generates 1MB+ of cumulative data sent per request. That's before any actual work is done.

6. Sub-agents are fully isolated

When Claude spawns a sub-agent via the Agent tool, it creates a completely independent API call with no access to the parent conversation history. Each agent starts fresh.

What You Should Do About It

These findings have real implications for your workflow and costs

Keep your CLAUDE.md lean. Every line gets sent on every request. Trim anything that isn't actively useful.
Use screenshots sparingly. One image can cost as much as several paragraphs of text in tokens.
Use /clear in long sessions. After ~20-30 turns, hit /clear to reset the context window. Your costs will thank you.
Don't over-install MCP tools. Unused MCP tools don't cost tokens (they're lazy-loaded), but the initial name list still adds up if you have dozens installed.

Try It Yourself

brew install --cask kangraemin/tap/claude-inspector && sleep 2 && open -a "Claude Inspector"

Or grab the DMG from GitHub Releases.

Source code: github.com/kangraemin/claude-inspector

DEV Community