You're Probably Wasting Money on AI Coding Tools (And You Have No Idea How Much)
AI coding assistants have officially crossed the threshold from "cool experiment" to "monthly line item on the credit card." Claude Code, Cursor, GitHub Copilot, Codex — most devs I talk to are running at least two of these simultaneously, and the costs are adding up in ways that aren't obvious until something like codeburn hits 2,700 GitHub stars in under a week and you realize: oh, this is a whole category of problem now.
The problem isn't that AI tools are expensive. It's that they're opaque. You get a monthly bill that says "$40" or "$120" or, if you're a team lead with budget authority and a particularly aggressive engineering culture, "please explain this $3,400 charge." But you have almost no visibility into where those tokens went.
That's changing. And understanding the new generation of token cost observability tooling will make you a smarter buyer — and a more efficient AI user.
Why Token Costs Are Harder to Track Than You Think
Unlike traditional SaaS pricing (flat seat fees, predictable per-request charges), LLM APIs price on a dimension most devs aren't trained to think about: context window consumption.
Every message you send to Claude or GPT includes not just your current prompt, but potentially:
- Your entire conversation history
- System prompts (sometimes thousands of tokens of boilerplate)
- File contents you've attached or that the tool auto-includes
- Tool call outputs and responses
- Re-injected context from previous turns
When you type a two-word response in a long Cursor session, you might be sending 20,000 tokens of context along with it. Multiply that by however many times you hit "Apply" on a suggestion, and the math gets ugly fast.
Here's a rough back-of-napkin example:
Session: 3-hour debugging session in Cursor
Average context per request: ~18,000 tokens (input)
Average output per request: ~800 tokens
Number of requests: ~40
Total input tokens: 720,000
Total output tokens: 32,000
Claude Sonnet pricing (hypothetical):
Input: $3.00 / 1M tokens → $2.16
Output: $15.00 / 1M tokens → $0.48
Session cost: ~$2.64
That's not outrageous. But scale it: 5 devs, 8 working hours, multiple AI tools, and you're looking at a meaningful monthly burn that nobody's tracking in any structured way.
The Token Sinks Nobody Talks About
After digging into observability data from tools like codeburn and similar projects, a few patterns show up repeatedly as the biggest cost drivers:
1. The "Re-explain the whole project" problem
AI coding tools often re-inject your entire codebase context on every request. Some tools are smarter about chunking and retrieval, but many — especially agentic workflows — err on the side of "more context = better results." Which is often true! But it's expensive.
The fix: be explicit about what files you want the AI to have access to. Don't let tools auto-index your entire monorepo when you're just trying to fix a CSS bug.
2. Failed attempts that cost as much as successful ones
This is the brutal one. If you ask the AI to refactor a complex function and it gets it wrong three times before nailing it on the fourth try — you paid for all four attempts. The failed ones aren't discounted.
# Rough cost comparison: one complex refactoring task
# Attempt 1 (failed): 12,000 tokens in + 3,000 out
# Attempt 2 (failed): 14,000 tokens in + 2,800 out
# Attempt 3 (failed): 16,000 tokens in + 4,200 out
# Attempt 4 (success): 18,000 tokens in + 5,100 out
# Total: 60,000 tokens in + 15,100 out
# vs. getting it right first try: 12,000 + 5,100
Clear, specific prompts are not just a workflow optimization — they're a cost optimization. Ambiguity is literally billable.
3. System prompt bloat
Many AI tools (and especially the frameworks you build on top of them) use massive system prompts. I've seen system prompts over 8,000 tokens that get prepended to every single request. If you're making 100 requests a day, that's 800,000 tokens of system prompt input that you're paying for regardless of whether that context is relevant.
Audit your system prompts. Trim the fat. The difference between a 2,000-token and 8,000-token system prompt adds up to real money across a team.
What Token Cost Dashboards Actually Show You
Tools like codeburn operate by intercepting or parsing usage data from AI tool APIs and presenting it in a usable format. The better ones break down costs by:
- Tool (which AI assistant generated the cost)
- Session/project (which codebase or conversation)
- Time (hourly/daily/weekly burn rates)
- Model (cost per model if you're mixing Sonnet vs. Opus vs. Haiku)
- Token type (input vs. output, because output tokens cost dramatically more)
A typical TUI interface looks something like this conceptually:
┌─────────────────────────────────────────────────────┐
│ AI Token Cost Dashboard [April 2026] │
├─────────────────────────────────────────────────────┤
│ Today's spend: $4.23 │
│ This week: $31.87 │
│ Monthly projection: $127.48 │
├──────────────────┬──────────────────────────────────┤
│ By Tool │ By Project │
│ Cursor $2.11 │ api-refactor $1.84 │
│ Claude $1.47 │ frontend-cleanup $1.22 │
│ Copilot $0.65 │ misc/scratch $1.17 │
└──────────────────┴──────────────────────────────────┘
The insight isn't just "here's what you spent." It's "here's the pattern." When you can see that your api-refactor project costs 3x more per hour than other projects, you start asking why. Usually the answer involves large context windows, lots of failed attempts, or a very chatty system prompt.
Setting Budgets Without Crippling Your Workflow
Here's my hot take: hard token limits are usually a mistake. If you hit your budget at 3pm and the AI shuts off, you lose the last hour of productivity before a deadline. That's a bad trade.
Better approach: soft budgets with visibility and friction.
What actually works:
Daily awareness notifications — get a Slack/Telegram ping when you hit 50% and 80% of your daily budget. Not a hard stop, just awareness. You'll naturally become more deliberate.
Project-level budgets — track cost per project, not just total spend. It reframes the question from "am I spending too much on AI?" to "is this project's AI spend proportionate to its business value?"
Model routing by task complexity — use cheap models (Haiku, GPT-4o-mini) for simple completions, expensive models (Opus, o3) only when complexity justifies it. This is the single highest-leverage optimization most devs aren't doing.
# Rough model routing logic
def pick_model(task_complexity: str) -> str:
match task_complexity:
case "simple": return "claude-haiku-3"
case "moderate": return "claude-sonnet-4"
case "complex": return "claude-opus-4"
case _: return "claude-sonnet-4" # sensible default
Weekly cost reviews — 10 minutes on Friday looking at what drove your AI spend that week. You'll find patterns: certain task types, certain times of day, certain projects that are systematically expensive.
For Team Leads: This Is Now a Budget Category
If you're managing a team of devs with access to AI tools, you need to treat this like cloud compute — not like a subscription service you pay for and forget.
Key things to track:
- Per-dev monthly spend — outliers indicate either very heavy users (good or bad, depends on output) or tooling misconfiguration
- Cost per shipped PR/feature — the actual ROI signal. High AI spend per output is a yellow flag
- Model mix — are your devs defaulting to the most expensive model even for trivial tasks?
Most enterprise AI tool vendors offer usage exports or API access to cost data. If they don't, that's a vendor evaluation criterion worth taking seriously.
The Uncomfortable Truth About AI Coding ROI
Here's where I'll be blunt: most devs don't actually know if their AI tools are paying off. They feel more productive. They ship faster some days. But without cost visibility, you can't calculate ROI — you're just guessing.
Token cost observability is the first step toward treating AI spend like any other engineering cost: something you measure, optimize, and make deliberate decisions about.
The devs who get good at this in the next 12 months will have a real competitive advantage — not just in their own productivity, but in being able to make credible arguments about AI tool budgets, justify new tool adoption, and spot when an AI-heavy approach is actually the wrong call for a given task.
That's not a small thing. The ability to say "here's what we spent on AI for this feature, here's the output, here's whether it was worth it" is a superpower in a world where everyone's AI spend is going up and almost nobody can answer that question.
Getting Started
If you want to start tracking your AI costs today:
- Check if your tools expose usage data — most do, either in the UI or via API
- Try codeburn or a similar TUI — even 3 days of data will surprise you
- Start a simple tracking habit — even a weekly note of your total spend builds awareness
- Audit one expensive session — pick a long session and try to reconstruct why it cost what it did
The goal isn't to spend less on AI. The goal is to spend right on AI — knowing where every dollar goes and making sure it's earning its keep.
Token cost observability is early-stage but growing fast. If you're building in this space or have tooling recommendations, drop them in the comments.
Top comments (0)