A simple Mac workflow for keeping AI coding token usage under control

#ai #webdev

If you use Claude Code, Codex, Cursor, or a local agent loop for real work, token usage is easy to ignore until it starts affecting your day.

I found it helps to treat AI usage like CPU, memory, or battery: not something to obsess over, but something that should stay visible while you work.

Here is the workflow I use on macOS.

1. Make AI usage ambient

The worst place for usage data is a billing page you only open after something feels wrong.

I want the rough answer visible while I am coding:

Did this session get expensive fast?
Am I burning context on a low-value task?
Is a refactor still reasonable, or should I split it?
Did an agent get stuck in a loop?

That is why I built TokenBar, a small Mac menu bar app that shows LLM token usage in real time. It is free to try, and Pro is $15 lifetime.

The point is not perfect accounting. The point is feedback before the waste happens.

2. Split work before the context gets messy

When a task has too many goals, the model spends tokens re-reading unclear context.

I now split AI coding sessions like this:

one session to understand the bug
one session to patch the smallest fix
one session to test and clean up
one session for docs or release notes

This feels slower at first, but it usually saves time because each prompt has a cleaner job.

3. Watch for token burn smells

A few patterns usually mean I should stop and reset:

the model keeps restating the same plan
it edits unrelated files
it asks for broad context again
tool calls are happening without a clear next step
the answer gets longer while the code change gets smaller

When I see that, I either tighten the task or start a fresh session with a smaller scope.

4. Keep a tiny session note

For longer AI coding work, I keep a short scratch note:

goal
files touched
current blocker
what not to change
next command to run

This prevents the classic "here is everything again" prompt that burns a ton of context just to rebuild state.

5. Do not optimize every token

The goal is not to make AI coding feel cheap and anxious.

The goal is to notice obvious waste early, especially with agentic tools that can chew through context quickly. A visible meter plus a few habits is usually enough.

My rough rule: if I would want CPU, memory, or battery visible for a long-running process, I probably want AI token usage visible too.

DEV Community