How I tell when Claude Code or Codex is burning tokens on the wrong work

#ai #productivity #tooling #programming

AI coding agents make it easy to confuse activity with progress.

Claude Code, Codex, Cursor, and similar tools can be genuinely useful, but a long session can also hide a lot of wasted work. The agent is editing files, reading context, retrying a plan, and asking the model for more help. From the outside, it looks productive.

The problem is that you usually notice the waste too late.

Here are the signs I watch for during an AI coding session.

1. The agent keeps rereading the same files

If the same few files appear again and again, I assume the context is not sharp enough.

That usually means I need to stop and give the agent a smaller task, like:

inspect this one bug
edit only this file
write the failing test first
explain the next patch before applying it

A vague task burns context. A smaller task gives the model less room to wander.

2. The token curve jumps but the diff stays tiny

This is the easiest warning sign.

If tokens climb quickly but the actual code diff is still small, something is off. Maybe the agent is exploring. Maybe it is stuck. Maybe it is repeatedly asking for large chunks of context.

Sometimes that is worth it. Most of the time, it means I should interrupt and reset the plan.

3. The agent starts solving adjacent problems

This happens a lot in real codebases.

You ask for one fix. The agent notices a nearby cleanup. Then a helper function. Then a naming issue. Then a test refactor.

That can be useful if you asked for a cleanup pass. It is expensive if you only needed one bug fixed.

I try to separate these modes:

fix mode: smallest safe change
cleanup mode: improve surrounding code
exploration mode: read and report only

Mixing them is where sessions get messy.

4. The session feels busy but I cannot summarize what changed

This is my favorite human check.

If I cannot explain the session in one sentence, I pause it.

Good: "Fixed the settings window crash by guarding the nil value."

Bad: "It changed a bunch of stuff around state and maybe tests."

The second one is usually where hidden cost lives.

The tiny workflow that helped

I started treating token usage like a live system metric, not a bill I check later.

During a coding session, I want to know:

which provider is being hit
roughly how much context is being used
whether usage is climbing faster than the actual diff
when it is time to stop and give the agent a smaller task

That is why I built TokenBar, a small Mac menu bar app for seeing AI token usage while I work.

It is here if you want to try it: https://tokenbar.site/

Even without a tool, the habit is useful: watch for the moment when the agent is doing work, but the product is not getting meaningfully closer to done.

That is the point where I stop the run and make the next instruction smaller.