If you’re a solo builder using Claude/Codex/Cursor all day, you probably know this feeling:
- you shipped “a lot”
- your AI bill still hurt
- your screen-time report looked embarrassing
That was me.
I kept treating AI cost and focus as separate problems. They’re not.
They feed each other.
When I’m distracted, I write sloppier prompts.
Sloppier prompts create more retries.
More retries = token burn + time burn.
Here’s the exact system I now run.
1) Track spend at session level (not month-end panic)
I used to check billing dashboards too late.
Now I track spend while I work, session by session.
I built TokenBar (tokenbar.site) for this:
- live token + cost visibility in the Mac menu bar
- immediate feedback when a prompt pattern is wasteful
- no end-of-month surprise
The behavior change was instant:
If a loop starts burning, I catch it in minutes, not days.
2) Run short “prompt quality sprints”
Before opening an IDE, I do one 10-minute pass:
- define success output in 3 bullet points
- define constraints (latency, format, edge cases)
- define stop condition (what counts as done)
This cut retries more than any model switch I tested.
3) Block feeds, not the whole internet
I don’t need a prison laptop. I need fewer trapdoors.
I built Monk Mode (mac.monk-mode.lifestyle) for this:
- feed-level blocking on Mac (YouTube feed, X timeline, etc.)
- still lets me use tools I actually need
- no context-switch vortex when I hit a hard bug
My rule: if I’m stuck, I can research docs, but no algorithmic feeds.
4) Use a fallback protocol for outages
When Claude/Codex has issues, I used to doomscroll while “waiting.”
Now I switch to:
- spec cleanup
- test writing
- docs and commit hygiene
This keeps momentum even when model access is flaky.
5) Cap retries per task
Hard rule: max 3 retries before changing approach.
If retry #3 fails, I must do one of:
- simplify the task
- reduce context
- change model/tool
- park and return later
No more infinite looping because “one more try might work.”
6) Measure cost per shipped outcome
Not “cost per day.”
Not “tokens consumed.”
Metric that matters:
Cost per useful shipped task.
This prevents over-optimizing raw token numbers while under-delivering product.
7) Timebox deep work with no-feed windows
I run 90-minute blocks.
During block:
- feed-level blocks ON
- cost visibility ON
- one task only
It’s boring. It works.
8) Weekly review (20 mins)
Every week I review:
- top 3 expensive sessions
- what caused waste (retry loops, vague prompts, giant context)
- one rule update for next week
Small rule changes compound quickly.
The key lesson
If your attention is leaking, your AI budget leaks too.
Treat focus and spend as one system.
That shift alone gave me cleaner shipping days, lower token burn, and way less late-night doomscroll regret.
If you’re building with AI all day, fix both together.
Top comments (0)