DEV Community

Cover image for I built a local reverse proxy to see what Claude Code actually sends to Anthropic
侯垒
侯垒

Posted on

I built a local reverse proxy to see what Claude Code actually sends to Anthropic

The problem I couldn't solve

I was spending ~$1,800/month on Claude Code.

I had no idea where the money was going. I had no idea which prompts were 4,000-token monstrosities, which ones were 200-token gems, or which ones I'd accidentally repeated 3 times this week.

I tried the obvious tools first:

  • mitmproxy — didn't work. Claude Code (and Codex, DeepSeek, Kimi, GLM, etc.) all ignore HTTP_PROXY because they're native CLIs that open HTTPS sockets directly.
  • Charles — same problem.
  • Langfuse / Helicone — these are SaaS. You have to send your data to them. Not what I wanted.
  • Custom hooks — limited to events the CLI exposes. I wanted the raw HTTP.

I wanted a local, open-source, zero-account way to see what my coding agent was doing.

The solution: a local reverse proxy on the loopback

The insight: every coding agent CLI talks to api.anthropic.com (or similar). If I make it talk to http://127.0.0.1:port instead, and have a tiny proxy on that port forward to the real API, the local hop is plain HTTP — easy to log, no CA cert, no TLS pinning pain.

That's it. That's the whole trick.

npm i -g ccglass
ccglass claude
# → opens http://localhost:8123 in your browser
# → real-time dashboard of every request
Enter fullscreen mode Exit fullscreen mode

What I learned in 30 days

After running every Claude Code session through it, I found:

1. I had a 38% cache hit rate I didn't know about

I was repeating myself in 38% of prompts and paying full price. The dashboard made it visible. I rewrote my CLAUDE.md to front-load context — cache hit rate jumped to 70%, monthly bill dropped 35%.

2. Per-provider cost varies 10x

Same task:

  • Claude Sonnet 4.6: $0.42
  • GPT-4o: $0.31
  • DeepSeek: $0.04

I started picking per-task. Anthropic for quality, DeepSeek for bulk.

3. Turn counts were higher than I thought

Average 4.2 turns per task. After seeing the data, I rewrote my CLAUDE.md. Turn count dropped to 2.8. Less back-and-forth = less cost = faster delivery.

4. MCP self-inspection is wild

ccglass has an MCP server. When you run ccglass claude, the agent can query its own request history inside the chat. I asked Claude "what did I prompt you with 3 turns ago?" and it answered correctly.

What's supported

16+ providers out of the box:

  • Coding agents: Claude Code, Codex, OpenCode, CodeBuddy, Reasonix
  • Pure LLM APIs: Anthropic, OpenAI, DeepSeek, Kimi, GLM, OpenRouter
  • Cloud: AWS Bedrock, GCP Vertex AI
  • Local: Ollama, LM Studio

The limits (I want to be honest)

  • Cursor subscription models can't be intercepted (they use a server-side proxy).
  • VS Code Continue with built-in models: same.
  • It's local-only by design. No SaaS, no telemetry, no account. (If you want cloud, use Langfuse.)

Open source

GitHub: https://github.com/jianshuo/ccglass

460+ stars at time of writing. MIT licensed. PRs welcome.

If you ship with Claude Code / Codex / Kimi and have ever asked "where is my money going", try it once. The data is eye-opening.

Top comments (0)