Three months ago I had four AI coding tools set up: Claude Code, Codex CLI, Gemini CLI, and a chat UI for quick questions. Every month I'd get a bill from Anthropic and a bill from OpenAI and vaguely wonder what I'd actually spent them on.
I had no idea which model was being called when. I didn't know if Claude Code was routing to Sonnet or Opus. I didn't know how many tokens Gemini was burning in the background. I just paid the bill and moved on.
Then I looked at one month's invoice line by line.
The answer was uncomfortable.
The Problem With Opaque AI Billing
When you use AI coding tools directly, the billing is aggregated. You see "claude-sonnet-4-6: 2.4M tokens" but you don't know:
- Which tasks generated those tokens (code review? refactors? quick completions?)
- Which tool was responsible (Claude Code? your chat UI?)
- Whether any of it could have been handled by a cheaper — or free — model
You're essentially flying blind. You optimize what you can measure, and the billing dashboards the providers give you aren't built for developers trying to understand usage at the tool level.
What I Did About It
CliGate is a local proxy I built that sits between your AI coding tools and the upstream APIs. All four tools route through it — one localhost:8081, one place to manage credentials and routing.
That position in the stack turned out to be the perfect place to add cost tracking.
Every request passes through the proxy. The proxy knows: which tool sent it, which model was requested, how many tokens were used (from the response stream), and what each model costs per token. The math is simple. The data is suddenly very visible.
Here's what the usage dashboard looks like after a week of normal coding work:
Provider breakdown (this week)
──────────────────────────────────────────
Anthropic API $4.82 68%
ChatGPT Account $0.00 0% ← account pool, no API cost
Free (Kilo AI) $0.00 0% ← routed to DeepSeek/Qwen
OpenAI API $2.27 32%
──────────────────────────────────────────
Total $7.09
Model breakdown told an even more interesting story:
claude-sonnet-4-6 $4.21 59%
claude-haiku-4-5 $0.00 0% ← free routing active
gpt-4o $1.89 27%
codex-mini $0.38 5%
The haiku line at zero was the thing that made me stop and think.
The Bit I Didn't Expect: Some Models Are Just Free
CliGate has a feature called free model routing. When a request comes in for claude-haiku-4-5, instead of forwarding it to Anthropic, the proxy routes it to a free model — DeepSeek R1, Qwen3, MiniMax, whatever you've configured — via Kilo AI. No API key needed.
I turned this on almost as an experiment. But looking at the usage stats a week later: every quick question, every short completion, every "what does this function do" — all of that had been handled for free. The expensive Sonnet calls were left for the work that actually needed it.
That split happened automatically. I didn't have to think about it.
You can change which free model handles haiku requests from the Settings tab. I've been rotating between DeepSeek R1 and Qwen3 depending on the task type — DeepSeek for reasoning-heavy work, Qwen3 for code generation.
The Details That Actually Changed My Behavior
Per-account tracking. I have multiple Claude accounts in the pool. The usage stats break down by account, so I can see if one account is hitting its quota faster than others and rebalance.
Daily and monthly views. You can toggle between a daily sparkline and a monthly total. The daily view is where you catch the outliers — that one afternoon you had three long Claude Code sessions refactoring a module shows up as a spike and explains why a particular week cost more.
Pricing registry. Every model's per-token price is configurable. When OpenAI changes pricing (which happens), you can update it in the dashboard without touching any config files. You can also add manual overrides for models that aren't in the default list.
Cost per request in the logs. The request log view shows cost alongside each request. If something seems expensive, you can pull up the exact prompt, response, token count, and cost in one place.
What This Changed Practically
I now route claude-haiku tasks through free models by default, and I've set up app-level routing so my quick chat window (the thing I use for "hey what's this error") hits the free path while Claude Code gets the full Sonnet model.
My monthly AI tool spend dropped roughly 40% without changing how I actually work.
The bigger change is more subtle: I stopped treating AI API costs as a fixed overhead I couldn't influence. Once you can see the breakdown, you start making different decisions about which model to reach for.
If you're running multiple AI coding tools and paying per-token for all of them, it's worth spending 10 minutes to actually look at where the spend goes. The answer might be more improvable than you'd expect.
CliGate is free and open source: github.com/codeking-ai/cligate
What does your current AI tool spend look like? Are you tracking it at all, or just paying the bill?
Top comments (1)
Curious what others are seeing — are you tracking per-model AI spend at all, or just paying the monthly bill
without looking too closely?