John

Posted on Jun 5

What I watch before a Claude Code session hits the wall

#ai #webdev #productivity

Claude Code and Codex sessions rarely fail all at once.

They usually degrade in a few small steps:

The prompt gets a little broader.
The context gets a little stale.
The agent starts re-reading the same files.
The next answer takes longer than the last one.
The usage limit warning shows up after the damage is already done.

That last part is the annoying one. The built-in usage signal is useful, but it is often too late to change behavior.

I do not want to learn that a session was expensive after I am already committed to it. I want to know while I can still reset, trim context, or split the task.

The signal I care about is burn rate

A dashboard is fine for accounting. It is bad for behavior change.

When I am deep in an AI coding session, I am not opening a billing page every ten minutes. I am watching the task, the code, the tests, and the next prompt.

The useful question is not just:

How much did this cost?

It is:

Is this session still worth continuing in its current shape?

That is a burn-rate question.

If a session is burning through usage faster than expected, I treat it like a signal that something is wrong. Not always, but often enough to matter.

What usually causes the spike

The biggest spikes in my own workflow usually come from one of these:

1. The task is too wide

"Fix this feature" turns into reading the whole app.

A better prompt is usually something like:

Inspect these three files, explain the likely cause, then propose the smallest patch.

That cuts down wandering.

2. The agent keeps rediscovering context

If the agent keeps asking the same question in different forms, the session is probably carrying too much history.

At that point, starting a clean session with a tight summary is cheaper than forcing the current one forward.

3. The model is too strong for the step

Not every step needs the biggest model.

I like using stronger models for architecture, debugging, and ambiguous failures. I do not need them for repetitive edits, small refactors, or formatting cleanup.

4. I am using AI as a search engine

This is the sneakiest one.

If the agent is reading docs, searching examples, and explaining basics for too long, I usually stop and do the research directly. Then I bring back the exact constraint.

My practical rule

I try to watch for two things during a session:

usage rising faster than the work is getting clearer
repeated context reads without a concrete patch or decision

When both are true, I stop the session.

Then I do one of three things:

restart with a smaller prompt
split the task into a diagnosis step and an implementation step
switch models for the boring part

That habit has saved me more than any clever prompt template.

Why I put this in the menu bar

I built TokenBar because I wanted this signal visible while coding, not buried in a provider dashboard.

The menu bar is boring in a good way. It is always there, but it does not interrupt me. If usage looks normal, I ignore it. If it starts climbing weirdly, I catch it before the session gets sloppy.

That is the whole point: make AI coding usage visible while the work is still fixable.

If you are using Claude Code, Codex, Cursor, or API-based AI coding tools on a Mac, this kind of small feedback loop helps a lot.

TokenBar is here if you want to try it: https://tokenbar.site/

DEV Community