Last month I looked at my Anthropic dashboard and almost choked. $340 in Claude Code usage. As a solo dev building two Mac apps, that's not sustainable.
But here's the thing — I didn't switch to a cheaper model. I didn't downgrade anything. I just changed how I worked, and my bill dropped to ~$130 this month. Same output, arguably better code.
Here's what actually moved the needle.
1. Real-Time Token Visibility Changed Everything
The single biggest change was seeing what each request cost me as it happened. Before, I'd check my usage dashboard once a week and have zero idea which sessions burned the most.
I started using TokenBar — a $5 Mac menu bar app that shows live token cost across all your LLM providers. Having that number visible at all times made me unconsciously write tighter prompts. I stopped dumping entire files into context "just in case." I noticed when a retry loop was burning tokens and killed it early.
The awareness alone saved me probably 30-40%. It's like stepping on a scale every morning — you don't need a diet plan, you just start making better choices when you see the number.
2. I Stopped Coding During My Worst Hours
This sounds unrelated to AI costs, but hear me out. My most expensive Claude Code sessions were always between 10pm and 1am. Tired brain → vague prompts → more retries → more tokens.
I started blocking my feed-heavy apps (Twitter, Reddit, YouTube) during work hours using Monk Mode — a $15 Mac app that blocks at the feed level, not the app level. So I could still use Twitter for DMs or YouTube for tutorials, but the infinite scroll was just gone.
The result: I stopped doomscrolling at 9pm, went to bed earlier, coded in the morning when my prompts were sharp, and my token burn dropped dramatically during what used to be my most wasteful hours.
3. Subagent Delegation for Research Tasks
Instead of letting Claude Code read 15 files to "investigate" something (eating 15K+ tokens each time), I configured CLAUDE.md to spawn subagents for any task that needed 3+ file reads. The subagent does the research in isolation and returns a summary.
Simple rule in your CLAUDE.md:
## Context Management
- If a task requires reading 3+ files, spawn a subagent
- Subagents investigate and return summarized findings
- Main session stays clean and focused
This alone cut my per-session token usage by roughly 25%.
4. Model Routing by Task Complexity
Not every task needs Opus. I set up a simple routing pattern:
- Quick edits, formatting, simple questions → Haiku
- Code generation, refactoring → Sonnet
- Architecture decisions, complex debugging → Opus
Most of my daily tasks are Sonnet-tier. Opus is for the 20% that actually matters.
5. Prompt Templates for Repeated Patterns
I built a small library of prompt templates for things I do constantly:
- "Review this Swift file for memory leaks"
- "Write unit tests for this function"
- "Refactor this to use async/await"
Each template includes exactly the context needed — no more, no less. Templatized prompts are 40-60% shorter than my ad-hoc ones.
The Results
| Metric | Before | After |
|---|---|---|
| Monthly spend | ~$340 | ~$130 |
| Avg tokens per session | ~45K | ~18K |
| Late-night sessions | 40% of total | ~10% |
| Retry rate | ~30% | ~12% |
The Meta-Lesson
Most AI cost optimization advice is about which model to use. But in my experience, the biggest gains came from:
- Seeing the cost (TokenBar in menu bar)
- Fixing when I work (blocking feeds with Monk Mode, sleeping earlier)
- Fixing how I prompt (subagents, templates, model routing)
The first two are embarrassingly simple. But they account for about 70% of my savings.
If you're a solo dev or small team burning through AI credits, start with visibility. You can't optimize what you can't see.
I'm building both TokenBar ($5, Mac) and Monk Mode ($15, Mac) as a solo dev. Happy to answer questions about either in the comments.
Top comments (0)