Henry Godnick

Posted on Apr 7

My First Week Tracking AI Costs in Real Time Changed How I Code

#ai #llm #devtools #productivity

I've been building solo for about eight months now, and for most of that time I had zero visibility into what my AI usage was actually costing me day to day.

I knew roughly what I was spending monthly — Claude's billing page tells you that. But the real-time picture? Completely blind. I'd open my editor, start iterating on a feature, spend two hours going back and forth with the model, and have no idea whether that session cost me $0.40 or $4.

That changed about a week ago when I finally set up TokenBar properly — a little menu bar app that sits in your macOS status bar and shows you token counts and estimated costs as you work.

Here's what actually surprised me.

Day 1: I was burning tokens on garbage

Within the first hour of having real-time visibility, I caught myself doing something dumb. I was asking the model to regenerate entire code blocks when I only needed one function tweaked. Every time I sent the full context unnecessarily, I was pumping in 8-12k tokens for no reason.

I knew this was inefficient in theory. But seeing the number tick up in real time — literally watching it happen — made me actually change the behavior. I started trimming my prompts. I started being more precise about what I needed.

Day 1 cost: noticeably lower than my usual daily average.

Day 3: I realized I had two very different work modes

After a few days of watching my usage, a pattern emerged. I have two modes:

Exploration mode — I'm figuring something out, trying different approaches, asking follow-up questions. This is expensive. 30-50k tokens per session, sometimes more.

Execution mode — I know exactly what I want, I'm just asking the model to write it. Much cheaper. 5-15k tokens per session.

I'd never separated these in my head before. But tracking them in real time made the distinction obvious. Now I try to front-load the exploration work before I start a focused build session, which has cut down on the expensive back-and-forth.

Day 5: I stopped treating AI like a free resource

This is the big one.

When you're on a monthly subscription and you don't track anything, AI usage feels free. It's a sunk cost. You already paid, might as well use it however you want.

But when you see token counts in your menu bar, that psychology shifts. Not because the money is dramatic — we're talking cents per session, not dollars — but because the visibility creates a feedback loop. You start making intentional tradeoffs.

"Do I actually need to ask this, or am I just being lazy?"

"Can I figure this out myself first, then use the model to verify?"

I became a more deliberate developer. That's not something I expected from a token counter.

Day 7: The number that actually matters

By the end of the week I had a real picture of my usage. My average day runs about 180-220k tokens across all my AI work. On heavy architecture days, it can hit 400k+.

This matters because I can now:

Budget accurately before the bill arrives
See immediately when something goes sideways (like the day I accidentally fed a 40k token context to a simple debug question)
Decide intelligently which model to use for which task

What changed in practice

A week in, I code differently. Not dramatically — I'm not obsessing over every token. But I'm more intentional. I think about context window usage the way I used to think about function complexity or database query count. It's just another resource to manage.

If you use Claude, GPT-4, or any API-billed model regularly as a solo dev, having real-time visibility is genuinely useful. TokenBar does exactly this for macOS — it's $5, sits in your menu bar, and just works.

The payoff isn't the $5 savings in tokens. The payoff is the habit change.

Top comments (1)

Vikrant Shukla • May 11

This resonates strongly, especially the "two modes" observation. The exploration vs. execution split is real, and the fact that you had to see it in real time to actually act on it is the important part — knowing it abstractly doesn't change behavior.

The limitation I kept hitting with menu-bar tools like TokenBar (which is great for solo macOS work) is that they capture what the SDK reports, not what's actually happening across all your tools. When you're running Claude Code sessions in one terminal, Cursor in your editor, and maybe a notebook in the background, the per-tool view fragments. You get three separate cost pictures that don't add up to a project budget.

That's the specific gap I built Halton Meter (haltonmeter.com) to address — it runs as a local mitmproxy-based daemon that sits on the network layer and intercepts all outbound LLM API traffic regardless of which tool fired it. Every request gets attributed to a project via env var, working directory match, or process tree walk, then costed and written to a local SQLite DB. The halton-meter report command gives you a breakdown by project, model, and date.

What it confirmed for me was exactly your "Day 7" moment — once I could see total per-project cost (not per-session, not per-tool), the shape of where money was actually going was completely different from my intuition. The exploration-heavy early phase of a project was consistently 60-70% of total project cost, compressed into the first day or two.

The shift from "AI feels free because it's a sunk cost" to "I can see exactly what this architecture decision cost me" changes how you approach those early ambiguous phases.