How I Track Every Token I Spend on LLM APIs (And Why You Should Too)

#ai #productivity

If you're building with LLM APIs, you already know the pain: you ship a feature, usage spikes, and suddenly your OpenAI bill looks like a phone number.

The problem isn't that APIs are expensive — it's that most of us have zero visibility into where our tokens actually go.

The Invisible Cost Problem

Every prompt you send has a token count. Every completion returned has a token count. Most devs just... hope for the best. Maybe you check the dashboard once a month. Maybe you set a billing alert. But that's reactive, not proactive.

I wanted something that shows me token usage in real time, right where I'm already working — not buried in some dashboard I have to context-switch to.

What I Actually Do Now

I keep a token counter running in my Mac menu bar using TokenBar. It's dead simple: it sits up top, shows me running totals across providers, and I can glance at it the same way I check the clock.

No browser tab. No separate app to open. Just a persistent, quiet number that keeps me honest.

Why This Changed My Workflow

Once you can actually see your spend in real time, you start making better decisions:

Prompt engineering gets serious. When you watch tokens tick up live, you trim the fat fast. That 2000-token system prompt? Maybe 800 tokens does the same job.
Model selection becomes intentional. GPT-4 for everything? Not anymore. You start routing simple tasks to cheaper models when you see the difference in real time.
Budgeting becomes possible. Hard to budget what you can't measure. With live numbers, you can set actual daily/weekly targets.

The Bigger Picture

Token tracking is the new performance monitoring. Just like you wouldn't ship a web app without watching response times, you shouldn't ship AI features without watching token consumption.

The tools don't have to be complex. Sometimes a $5 menu bar app does more for your workflow than an enterprise observability platform.

If you're spending more than $50/month on LLM APIs and you don't have real-time visibility — fix that first. Everything else (caching, prompt optimization, model routing) gets easier once you can actually see what's happening.

What's your approach to tracking LLM costs? Drop your setup in the comments — curious what others are doing.

Top comments (2)

Apex Stack • Mar 24

The "invisible cost problem" framing is exactly right — and it gets worse at batch scale. I run a pipeline that generates AI content for 8,000+ stock and ETF pages using a local Ollama model, so I don't pay per-token to an API, but I learned the same lesson the hard way: without tracking tokens-per-run, you have no idea why some batches take 3 hours and others take 45 minutes.

The culprit in my case was prompt length variance by page type. Stock pages with earnings history and analyst data in the context were 3-4× more tokens than ETF pages. Once I started logging input/output tokens per job, I could see the pattern and split the batches by page type to normalize run time.

For folks on the API side, the real value of this kind of tracking isn't just cost — it's that token spikes are often the first signal of a prompt regression before you even notice degraded output quality. Good reason to treat token count as a metric worth monitoring, not just a bill to be shocked by at the end of the month.

Wes W • Jun 24

Solid — tracking after the fact is half of it; the other half is forecasting before you ship, where the surprise is always re-sent input (tools + context + history). Nice complement if useful: a free calculator that estimates spend from traffic/model/retrieval/tools. Try for example wiswes calculator for this purposes.