If your AI spend feels random, this is for you.
I’m a solo builder and I kept saying: “models are too expensive now.”
Reality: my workflow was leaking money.
Same with focus. I kept saying: “I just need more discipline.”
Reality: my feeds were stealing deep-work windows.
Here are the 7 leaks I fixed.
1) The “open 20 files and pray” loop
I used to dump huge context into Claude/Cursor/Codex, then rerun prompts when output was vague.
That burned tokens fast.
Fix:
- scope each request to 1 small task
- send only files needed for that task
- kill reruns unless I changed context
2) No session-level budget
I had monthly spend in mind, but no per-session guardrail.
Fix:
- set a hard session budget before starting
- track token/cost in real-time (I use TokenBar in my Mac menu bar)
- stop when budget hits and rewrite prompt instead of brute forcing
3) Treating retries like progress
Retries felt productive. They weren’t.
Fix:
- if output is off twice, rewrite constraints and acceptance criteria
- “what should this code do?” beats “try again”
4) Hidden doomscroll tax between coding blocks
I’d check one feed “for 2 minutes.” 25 minutes gone, brain context reset.
Fix:
- block algorithmic feeds during work windows (I use Monk Mode)
- keep comms tools available, kill infinite feeds
5) Prompting before thinking
I prompted first, clarified later.
Fix:
- write a 5-line spec first
- include explicit done condition
- ask model for plan, then execution
6) Letting novelty drive model choice
I chased whatever model was trending that day.
Fix:
- pick one default model for the week
- switch only when benchmarked on my own task
7) No post-mortem after expensive days
I’d notice a big bill and move on.
Fix:
- 3-minute daily review:
- what consumed most tokens?
- which prompts needed retries?
- what can be templatized tomorrow?
My current stack (simple, not fancy)
- TokenBar: real-time token/cost visibility in the menu bar so spend is never invisible
- Monk Mode: feed-level blocking so deep work survives
Both are tiny tools, but together they removed the two biggest drains: invisible cost and invisible distraction.
If your AI bill is up and output is flat, don’t start with “new model.”
Start with leak hunting.
Top comments (0)