Last month I had two different leaks in my workflow:
- my AI spend kept creeping up from "just one more run"
- my focus kept getting wrecked by infinite feeds
I kept treating them as separate problems.
They weren’t.
Both were attention leaks.
Both were small defaults compounding.
Here are 9 tiny systems that helped me stabilize both.
1) Set a hard spend cap before every serious coding session
Before I start, I set a cap for the session (example: "$8 max today").
When I don’t do this, I always rationalize one extra call, one extra retry, one extra context blast.
A cap makes decisions easier: refine prompt vs. brute-force more tokens.
2) Track cost in real time (not after the damage)
End-of-day invoices are too late.
I keep a live token/cost meter visible in the menu bar so I can see spikes while I’m working.
I built this into TokenBar because I needed it myself: tokenbar.site ($5 one-time).
The biggest win is behavioral: when I see burn rate, I naturally tighten prompts.
3) Use a 3-line brief for every agent task
My brief format:
- Done means: exact output shape
- Constraints: what not to do
- Stop rule: when to halt instead of looping
This dropped my “almost right, rerun again” loop dramatically.
4) Add a retry budget
I use a simple rule:
- first miss: tighten instruction
- second miss: reduce scope
- third miss: stop and redesign
Without this, it’s easy to keep paying for emotionally-driven retries.
5) Block feeds at the source during deep work
Timers alone weren’t enough for me.
If a feed opens, my brain takes the bait.
So I started blocking feed surfaces during build windows with Monk Mode (mac.monk-mode.lifestyle, $15 one-time).
Not all social apps, just the parts that trigger scroll spirals.
6) Pair block windows with shipping windows
I run 45–90 minute “ship blocks.”
During these blocks:
- feed surfaces blocked
- notifications minimized
- one repo, one outcome
The key is pairing restriction with a clear output target (PR, feature chunk, doc update).
7) Keep a “context payload” file ready
When tools crash, reset, or rate-limit, I used to rebuild context manually and waste time + tokens.
Now I keep one fresh context file with:
- current objective
- architecture notes
- known constraints
- open questions
Recovery gets much cheaper.
8) Track failure patterns, not just success output
I log expensive failures in one note:
- what prompted the detour
- what signal I ignored
- what I’ll do differently next run
Most of my cost improvements came from avoiding repeated mistakes, not better model settings.
9) Measure “focus debt” the same way as cloud debt
If I lose 40 minutes to doomscrolling, I count it like an actual bill.
That framing changed everything.
Time lost to feeds is often more expensive than token spend.
The simple stack I use now
- Real-time AI spend visibility: TokenBar
- Feed-level focus protection: Monk Mode
- Tiny operating rules: caps, retry budget, ship blocks
Nothing here is fancy.
But together, these defaults stopped both leaks enough that I can ship more with less stress.
If you’re solo-building with AI all day, I’d start with just two changes this week:
1) a hard daily spend cap
2) feed blocking during one 60-minute build block
Do that for 7 days and your numbers (and sanity) should look different.
Top comments (0)