Stop Guessing Your API Costs: Track LLM Tokens in Real Time

#productivity #ai

If you're building with LLMs, you already know the pain: you fire off a bunch of API calls during development, then check your dashboard the next morning and wonder how you burned through $40 overnight.

The problem isn't that API pricing is complicated — it's that there's zero visibility while you're working. You're flying blind until the bill shows up.

The Hidden Cost of Context Windows

Every time you send a prompt to GPT-4, Claude, or Gemini, you're paying for both input and output tokens. But here's what catches most developers off guard:

System prompts count every single time. That 2,000-token system prompt? It's billed on every request.
Conversation history adds up fast. A 10-message back-and-forth can easily hit 8,000+ tokens before you even type your next question.
Retries are silent killers. Rate limit hit? Auto-retry means double the cost for the same result.

Most devs don't realize how much they're spending until they've already spent it.

What Actually Helps

What I wanted was dead simple: a running count of tokens used across providers, visible at all times, without opening a browser tab.

I found TokenBar — it sits in your Mac menu bar and tracks token usage across OpenAI, Anthropic, and other providers in real time. $5 one-time purchase. No subscription.

The key insight is that real-time visibility changes behavior. When you can see tokens ticking up as you work, you naturally start optimizing: shorter system prompts, smarter conversation pruning, batching requests instead of firing them one at a time.

Quick Wins for Cutting Token Costs

Cache your system prompts — if your provider supports it, prompt caching can cut repeated costs by 90%.
Truncate conversation history — keep only the last N messages instead of the full thread.
Use cheaper models for simple tasks — not everything needs GPT-4. Route classification and extraction to smaller models.
Monitor in real time — you can't optimize what you can't see.

The LLM cost problem isn't going away. Models are getting more capable, which means bigger context windows, which means more tokens. Getting a handle on usage now saves real money as you scale.

What tools are you using to track API costs? Would love to hear what's working for others.