If you use Claude Code, Codex, Cursor, or a local agent loop for real work, token usage is easy to ignore until it starts affecting your day.
I found it helps to treat AI usage like CPU, memory, or battery: not something to obsess over, but something that should stay visible while you work.
Here is the workflow I use on macOS.
1. Make AI usage ambient
The worst place for usage data is a billing page you only open after something feels wrong.
I want the rough answer visible while I am coding:
- Did this session get expensive fast?
- Am I burning context on a low-value task?
- Is a refactor still reasonable, or should I split it?
- Did an agent get stuck in a loop?
That is why I built TokenBar, a small Mac menu bar app that shows LLM token usage in real time. It is free to try, and Pro is $15 lifetime.
The point is not perfect accounting. The point is feedback before the waste happens.
2. Split work before the context gets messy
When a task has too many goals, the model spends tokens re-reading unclear context.
I now split AI coding sessions like this:
- one session to understand the bug
- one session to patch the smallest fix
- one session to test and clean up
- one session for docs or release notes
This feels slower at first, but it usually saves time because each prompt has a cleaner job.
3. Watch for token burn smells
A few patterns usually mean I should stop and reset:
- the model keeps restating the same plan
- it edits unrelated files
- it asks for broad context again
- tool calls are happening without a clear next step
- the answer gets longer while the code change gets smaller
When I see that, I either tighten the task or start a fresh session with a smaller scope.
4. Keep a tiny session note
For longer AI coding work, I keep a short scratch note:
- goal
- files touched
- current blocker
- what not to change
- next command to run
This prevents the classic "here is everything again" prompt that burns a ton of context just to rebuild state.
5. Do not optimize every token
The goal is not to make AI coding feel cheap and anxious.
The goal is to notice obvious waste early, especially with agentic tools that can chew through context quickly. A visible meter plus a few habits is usually enough.
My rough rule: if I would want CPU, memory, or battery visible for a long-running process, I probably want AI token usage visible too.
Top comments (0)