Ever wondered what Claude Code is actually sending to the Anthropic API behind the scenes? I did — so I built Claude Inspector, a macOS desktop app that intercepts and visualizes Claude Code's HTTP traffic in real time via a local MITM proxy.
Claude Code CLI → Inspector (localhost:9090) → api.anthropic.com
All traffic stays local. Nothing is stored or shared.
How It Works
- Open Claude Inspector and click Start Proxy
- Run Claude Code with the proxy env var: ANTHROPIC_BASE_URL=http://localhost:9090 claude
- Every API request and response is captured and visualized in real time
What I Found
1. Your CLAUDE.md is sent on every single request
Every request silently prepends your project CLAUDE.md, global rules, and memory files as a system-reminder block. The structure looks like this:
Available skills list (~2KB)
CLAUDE.md + rules + memory (~10KB)
Your actual message (whatever you typed)
That's ~12KB of overhead before you type a single word — and it repeats on every request.
2. MCP tools are lazy-loaded
Built-in tools (27 of them) ship with full JSON schemas on every request. MCP tools, however, start as name-only placeholders. Their schemas get injected dynamically only when the model actually requests them — so unused MCP tools don't cost you tokens.
3. Screenshots are expensive
Images are base64-encoded and embedded directly in the JSON body. A single screenshot can add hundreds of kilobytes to your request. That one "quick screenshot" costs way more than you'd expect.
4. Skills vs. Commands are handled very differently
Local Commands (/clear, /mcp)
→ Model only sees the result, never the command itself
Skills (/commit, Skill("finish"))
→ Full prompt text is injected into the conversation
→ That prompt persists for the rest of the session
Skills inject their entire prompt text into your conversation and it stays there for the rest of the session.
5. Full conversation history is resent every time
Every API request includes your entire conversation history from the beginning. The longer your session runs, the more you pay — linearly.
A 30-turn conversation generates 1MB+ of cumulative data sent per request. That's before any actual work is done.
6. Sub-agents are fully isolated
When Claude spawns a sub-agent via the Agent tool, it creates a completely independent API call with no access to the parent conversation history. Each agent starts fresh.
What You Should Do About It
These findings have real implications for your workflow and costs
Keep your CLAUDE.md lean. Every line gets sent on every request. Trim anything that isn't actively useful.
Use screenshots sparingly. One image can cost as much as several paragraphs of text in tokens.
Use /clear in long sessions. After ~20-30 turns, hit /clear to reset the context window. Your costs will thank you.
Don't over-install MCP tools. Unused MCP tools don't cost tokens (they're lazy-loaded), but the initial name list still adds up if you have dozens installed.
Try It Yourself
brew install --cask kangraemin/tap/claude-inspector && sleep 2 && open -a "Claude Inspector"
Or grab the DMG from GitHub Releases.
Source code: github.com/kangraemin/claude-inspector

Top comments (0)