The problem I couldn't solve
I was spending ~$1,800/month on Claude Code.
I had no idea where the money was going. I had no idea which prompts were 4,000-token monstrosities, which ones were 200-token gems, or which ones I'd accidentally repeated 3 times this week.
I tried the obvious tools first:
-
mitmproxy — didn't work. Claude Code (and Codex, DeepSeek, Kimi, GLM, etc.) all ignore
HTTP_PROXYbecause they're native CLIs that open HTTPS sockets directly. - Charles — same problem.
- Langfuse / Helicone — these are SaaS. You have to send your data to them. Not what I wanted.
- Custom hooks — limited to events the CLI exposes. I wanted the raw HTTP.
I wanted a local, open-source, zero-account way to see what my coding agent was doing.
The solution: a local reverse proxy on the loopback
The insight: every coding agent CLI talks to api.anthropic.com (or similar). If I make it talk to http://127.0.0.1:port instead, and have a tiny proxy on that port forward to the real API, the local hop is plain HTTP — easy to log, no CA cert, no TLS pinning pain.
That's it. That's the whole trick.
npm i -g ccglass
ccglass claude
# → opens http://localhost:8123 in your browser
# → real-time dashboard of every request
What I learned in 30 days
After running every Claude Code session through it, I found:
1. I had a 38% cache hit rate I didn't know about
I was repeating myself in 38% of prompts and paying full price. The dashboard made it visible. I rewrote my CLAUDE.md to front-load context — cache hit rate jumped to 70%, monthly bill dropped 35%.
2. Per-provider cost varies 10x
Same task:
- Claude Sonnet 4.6: $0.42
- GPT-4o: $0.31
- DeepSeek: $0.04
I started picking per-task. Anthropic for quality, DeepSeek for bulk.
3. Turn counts were higher than I thought
Average 4.2 turns per task. After seeing the data, I rewrote my CLAUDE.md. Turn count dropped to 2.8. Less back-and-forth = less cost = faster delivery.
4. MCP self-inspection is wild
ccglass has an MCP server. When you run ccglass claude, the agent can query its own request history inside the chat. I asked Claude "what did I prompt you with 3 turns ago?" and it answered correctly.
What's supported
16+ providers out of the box:
- Coding agents: Claude Code, Codex, OpenCode, CodeBuddy, Reasonix
- Pure LLM APIs: Anthropic, OpenAI, DeepSeek, Kimi, GLM, OpenRouter
- Cloud: AWS Bedrock, GCP Vertex AI
- Local: Ollama, LM Studio
The limits (I want to be honest)
- Cursor subscription models can't be intercepted (they use a server-side proxy).
- VS Code Continue with built-in models: same.
- It's local-only by design. No SaaS, no telemetry, no account. (If you want cloud, use Langfuse.)
Open source
GitHub: https://github.com/jianshuo/ccglass
460+ stars at time of writing. MIT licensed. PRs welcome.
If you ship with Claude Code / Codex / Kimi and have ever asked "where is my money going", try it once. The data is eye-opening.
Top comments (0)