Last week I was debugging an AI agent chain. The agent was supposed to make 3-4 tool calls per request. Instead, it was looping — retrying the same failed call over and over.
I didn't notice for about 90 seconds. In that time, it burned through roughly $30 in Claude API tokens.
The problem? I had no real-time visibility into what was happening. My provider dashboard updates with a delay. My logging was async and I wasn't watching the terminal output closely enough.
What I changed
I started using TokenBar — a native macOS menu bar app I built that shows live token usage as API calls happen.
Now my workflow looks like this:
- Start a dev session
- Glance at the menu bar — token counter is ticking
- If usage spikes unexpectedly, I see it immediately
- I stop the process, fix the bug, and save money
That $30 bug? With TokenBar running, I would have caught it in under 10 seconds — the counter would have spiked visibly in the menu bar.
Why this matters more with agents
Traditional LLM usage (single prompt → single response) is relatively predictable. But AI agents introduce:
- Retry loops — failed tool calls that trigger repeated attempts
- Context accumulation — each step adds to the conversation, inflating token counts
- Recursive chains — agents calling sub-agents, each with their own token budget
- Unpredictable branching — the agent decides what to do next, and sometimes it decides wrong
All of these can cause token usage to spike in ways you don't expect. Having real-time visibility isn't a nice-to-have anymore — it's essential.
How TokenBar works
- Sits in your macOS menu bar
- Shows a live token counter that updates as API calls run
- Supports OpenAI, Claude, Gemini, Cursor, OpenRouter, Copilot, and more
- Runs locally — no cloud data collection
- $5 one-time purchase, no subscription
Try it
If you're building AI agents or working with LLM APIs on Mac, this is the cheapest insurance against runaway token costs.
Top comments (0)