The problem I couldn't solve
I was spending ~$1,800/month on Claude Code.
I had no idea where the money was going. I had no idea which prompts were 4,000-token monstrosities, which ones were 200-token gems, or which ones I'd accidentally repeated 3 times this week.
I tried the obvious tools first:
-
mitmproxy — didn't work. Claude Code (and Codex, DeepSeek, Kimi, GLM, etc.) all ignore
HTTP_PROXYbecause they're native CLIs that open HTTPS sockets directly. - Charles — same problem.
- Langfuse / Helicone — these are SaaS. You have to send your data to them. Not what I wanted.
- Custom hooks — limited to events the CLI exposes. I wanted the raw HTTP.
I wanted a local, open-source, zero-account way to see what my coding agent was doing.
The solution: a local reverse proxy on the loopback
The insight: every coding agent CLI talks to api.anthropic.com (or similar). If I make it talk to http://127.0.0.1:port instead, and have a tiny proxy on that port forward to the real API, the local hop is plain HTTP — easy to log, no CA cert, no TLS pinning pain.
That's it. That's the whole trick.
npm i -g ccglass
ccglass claude
# → opens http://localhost:8123 in your browser
# → real-time dashboard of every request
What I learned in 30 days
After running every Claude Code session through it, I found:
1. I had a 38% cache hit rate I didn't know about
I was repeating myself in 38% of prompts and paying full price. The dashboard made it visible. I rewrote my CLAUDE.md to front-load context — cache hit rate jumped to 70%, monthly bill dropped 35%.
2. Per-provider cost varies 10x
Same task:
- Claude Sonnet 4.6: $0.42
- GPT-4o: $0.31
- DeepSeek: $0.04
I started picking per-task. Anthropic for quality, DeepSeek for bulk.
3. Turn counts were higher than I thought
Average 4.2 turns per task. After seeing the data, I rewrote my CLAUDE.md. Turn count dropped to 2.8. Less back-and-forth = less cost = faster delivery.
4. MCP self-inspection is wild
ccglass has an MCP server. When you run ccglass claude, the agent can query its own request history inside the chat. I asked Claude "what did I prompt you with 3 turns ago?" and it answered correctly.
What's supported
16+ providers out of the box:
- Coding agents: Claude Code, Codex, OpenCode, CodeBuddy, Reasonix
- Pure LLM APIs: Anthropic, OpenAI, DeepSeek, Kimi, GLM, OpenRouter
- Cloud: AWS Bedrock, GCP Vertex AI
- Local: Ollama, LM Studio
The limits (I want to be honest)
- Cursor subscription models can't be intercepted (they use a server-side proxy).
- VS Code Continue with built-in models: same.
- It's local-only by design. No SaaS, no telemetry, no account. (If you want cloud, use Langfuse.)
Open source
GitHub: https://github.com/jianshuo/ccglass
460+ stars at time of writing. MIT licensed. PRs welcome.
If you ship with Claude Code / Codex / Kimi and have ever asked "where is my money going", try it once. The data is eye-opening.
Top comments (4)
The loopback-as-plain-HTTP trick is the kind of thing that's obvious only after someone says it — native CLIs ignoring
HTTP_PROXYis precisely why mitmproxy/Charles silently fail, and pointing the base URL at 127.0.0.1 sidesteps the whole TLS-pinning fight. Clean.The finding I'd underline for everyone reading: your 38%→70% cache-hit jump from front-loading CLAUDE.md is the highest-leverage cost lever most people never touch. Prompt caching is prefix-based, so the rule that helped us most is "most-stable content first, most-volatile last" — a single edit near the top invalidates the entire cached suffix, so one churny line high in the context can tank your hit rate even when 95% of the prompt is byte-identical.
Question: does it show where the cache broke within a request — which segment changed and invalidated the rest? Seeing hit vs miss is great, but the actionable bit is identifying the specific volatile block to push lower. That diff-against-previous-request view is what turned cache tuning from guesswork into a checklist for us.
This is exactly the kind of inspection layer agent workflows need. Once AI coding tools move from chat into terminals, the question becomes: what actually crossed the boundary, what was logged, and what can a developer audit later?
A local proxy turns trust into something closer to an artifact.
Armorer Labs perspective: local loopback inspection is a strong primitive for agent operations, especially because it captures what actually crossed the boundary rather than what the agent later summarizes.
One useful extension is to separate three streams in the proxy log: model prompts/responses, tool-call requests/results, and side-effect approvals. The first helps with debugging, but the second and third are what make post-run accountability possible: which MCP tool was called, what arguments were resolved, whether policy allowed it, and what evidence verified the outcome.
I also appreciate the explicit limits section here. For security-sensitive agent work, knowing what the proxy cannot see is as important as the pretty dashboard.
Cool, I'm going to try that too!