Last month I checked my combined AI tool spending and nearly choked. $847. Between Claude Max ($100), Cursor Pro ($20), ChatGPT Pro ($200), and raw API calls to OpenAI and Anthropic for my side projects, the costs had quietly ballooned without me noticing.
This month? $340. Same tools. Same output. Same hours coding.
Here's exactly what I changed.
The Core Problem: Invisible Costs
AI coding tools have a unique cost problem — you can't see what anything costs in the moment. With traditional SaaS, you pay $X/month and that's it. With AI tools:
- API calls vary wildly based on model, context length, and output length
- Subscription tools like Cursor and Claude Max have rate limits you hit without warning
- Different models have 10-50x cost differences for similar quality output
- Nobody shows you the price tag before you send a request
It's like grocery shopping where nothing has a price label. You just get a surprise bill at the end of the month.
Fix #1: Real-Time Cost Visibility (The Big One)
I installed TokenBar — a $5 Mac menu bar app that shows token usage and cost per request across all my AI providers. OpenAI, Anthropic, Google, Cursor, OpenRouter, Copilot — it tracks everything in one place.
What I discovered in the first week:
- My "quick questions" to Claude Opus were costing $0.15–0.40 each
- Cursor's autocomplete was burning through tokens on files I wasn't actively editing
- Some code review prompts were sending 50K+ tokens of context that wasn't relevant
- The same task on GPT-5.4-mini vs GPT-5.4 Pro was a 15x cost difference with 90% similar output
The visibility alone changed my behavior. Once you can see the meter running, you naturally start making better choices.
Fix #2: Model Tiering
Once I could see per-request costs, I created a simple system:
🔴 Tier 1 — Heavy Lifting ($0.30–$1.00 per request)
- Architecture decisions
- Complex debugging
- System design
- Models: Claude Opus, GPT-5.4 Pro
🟡 Tier 2 — Standard Work ($0.05–$0.15 per request)
- Implementing features
- Writing tests
- Code reviews
- Models: Claude Sonnet, GPT-5.4
🟢 Tier 3 — Simple Tasks ($0.001–$0.01 per request)
- Formatting, renaming
- Boilerplate generation
- Documentation
- Models: Claude Haiku, GPT-5.4-mini
Before this, I was using Opus for everything because "I want the best output." That's like taking an Uber Black to pick up coffee. Haiku can write a for-loop just fine.
Impact: ~40% cost reduction just from model selection.
Fix #3: Context Window Discipline
The most expensive part of any AI request is input tokens — the context you send. Most tools default to sending everything, and that's expensive.
In Cursor:
- Stopped letting it auto-include all open files
- Manually @-mention only the relevant files
- Close files I'm not actively working on
In Claude Code:
- Start new sessions for new tasks (don't carry 100K tokens of history)
- Use
/compactto summarize conversation before continuing - Be explicit about what context the AI actually needs
Impact: ~30% cost reduction per request.
Fix #4: Focus = Fewer Tokens
This one sounds weird, but the math checks out.
I was losing 20-30 minutes between coding sessions to doomscrolling — Twitter, YouTube, Reddit. Then I'd come back to code with degraded focus, write vague prompts, and end up in 5-round back-and-forth sessions with the AI that should have been 1-2 rounds.
I installed Monk Mode — $15 Mac app that blocks the algorithmic feeds on distracting sites without blocking the domains entirely. YouTube tutorials still work, but the recommended sidebar is gone. Twitter search works, but the For You feed disappears.
Better focus → better prompts → fewer iterations → fewer tokens → lower cost.
My average tokens per completed task dropped ~25% after I stopped context-switching every 15 minutes.
Impact: ~25% reduction in wasted tokens.
Fix #5: Weekly Cost Reviews
Every Friday, 5 minutes:
- Check total spend per provider
- Identify highest-cost sessions (what was I doing?)
- Calculate rough cost per completed feature
- Note any tools I'm paying for but not using
This catches drift. Without reviews, you slowly revert to old habits.
The Results
| Metric | Before | After |
|---|---|---|
| Monthly AI spend | $847 | $340 |
| Code output | ~15 features | ~15 features |
| Daily coding hours | 6-7h | 6-7h |
| Avg tokens per task | ~80K | ~45K |
| Doomscrolling time | 2-3h/day | ~30min/day |
The Takeaway
The savings didn't come from using AI less. They came from using it smarter.
The #1 change was simply making costs visible. Once I could see what each request cost in real time, everything else followed naturally. Model selection, context management, focus — they all improved because I had a feedback loop.
It's the same reason calorie labels on menus change what people order. You don't need willpower — you need information.
If you're a developer spending $200+/month on AI tools and you're not tracking per-request costs, you're almost certainly overspending by 40-60%. Not because the tools are bad, but because you're using premium models for tasks that don't need them.
Start with visibility. Everything else follows.
Tools I use:
Top comments (0)