If you're building with LLMs, you already know the pain: you fire off a bunch of API calls during development, then check your dashboard the next morning and wonder how you burned through $40 overnight.
The problem isn't that API pricing is complicated — it's that there's zero visibility while you're working. You're flying blind until the bill shows up.
The Hidden Cost of Context Windows
Every time you send a prompt to GPT-4, Claude, or Gemini, you're paying for both input and output tokens. But here's what catches most developers off guard:
- System prompts count every single time. That 2,000-token system prompt? It's billed on every request.
- Conversation history adds up fast. A 10-message back-and-forth can easily hit 8,000+ tokens before you even type your next question.
- Retries are silent killers. Rate limit hit? Auto-retry means double the cost for the same result.
Most devs don't realize how much they're spending until they've already spent it.
What Actually Helps
What I wanted was dead simple: a running count of tokens used across providers, visible at all times, without opening a browser tab.
I found TokenBar — it sits in your Mac menu bar and tracks token usage across OpenAI, Anthropic, and other providers in real time. $5 one-time purchase. No subscription.
The key insight is that real-time visibility changes behavior. When you can see tokens ticking up as you work, you naturally start optimizing: shorter system prompts, smarter conversation pruning, batching requests instead of firing them one at a time.
Quick Wins for Cutting Token Costs
- Cache your system prompts — if your provider supports it, prompt caching can cut repeated costs by 90%.
- Truncate conversation history — keep only the last N messages instead of the full thread.
- Use cheaper models for simple tasks — not everything needs GPT-4. Route classification and extraction to smaller models.
- Monitor in real time — you can't optimize what you can't see.
The LLM cost problem isn't going away. Models are getting more capable, which means bigger context windows, which means more tokens. Getting a handle on usage now saves real money as you scale.
What tools are you using to track API costs? Would love to hear what's working for others.
Top comments (0)