Key Takeaways
- 82% of developers default to OpenAI GPT models (Stack Overflow Developer Survey, 2025), but 60-70% of production API calls don't need a frontier model.
- Switching classification calls from GPT-4o to DeepSeek V3 saves 18x on input tokens ($2.50 → $0.14 per million).
- Combining model routing with prompt caching cuts total LLM spend by 80-95%.
- Average monthly AI spend hit $85,500 per company in 2025 — a 36% jump YoY (CloudZero, 2025).
Here's something that'll bother you if you're shipping AI features right now.
We looked at the first million API calls that came through Tokonomics — across 47 tenants, 9 providers, dozens of models. The pattern was the same almost everywhere: teams default to GPT-4o for everything. Customer support chatbots? GPT-4o. JSON extraction? GPT-4o. Classification into 5 categories? GPT-4o.
The waste isn't theoretical. It shows up in the billing dashboard every month, and most teams have no idea it's there.
Why Do 82% of Developers Default to GPT-4o?
Stack Overflow's 2025 Developer Survey found that 82% of developers use OpenAI GPT models. That makes GPT-4o the de facto standard.
It makes sense. OpenAI has the best docs. Every tutorial uses GPT-4o. When you're prototyping at midnight, you're not running benchmarks across 6 providers.
But prototyping habits become production costs. That model you picked in February is still running in June, processing 50,000 calls a day, and nobody's asked whether a $0.14/M model would give the same result as a $2.50/M model.
Our finding: Our own internal chatbot ran on GPT-4o for three months before anyone checked. Switching the FAQ portion to GPT-4o-mini cut that component's cost by 94% with no quality difference.
What Does Model Selection Actually Cost?
Here's what 1 million requests cost (500 input + 200 output tokens per call):
| Model | Monthly Cost |
|---|---|
| GPT-4o | $3,250 |
| Claude Sonnet 4 | $4,500 |
| Claude Haiku 3.5 | $1,200 |
| GPT-4o-mini | $195 |
| DeepSeek V3 | $126 |
| GPT-4.1 Nano | $130 |
That's a 25x cost difference between GPT-4o and GPT-4.1 Nano. For the same million requests.
Which Calls Don't Need a Frontier Model?
60-70% of API calls in typical SaaS apps are simple enough for budget models (Prem AI, 2026):
Send to a budget model ($0.10-$0.80/M input):
- Intent classification
- JSON/structured data extraction
- Short summaries (under 200 words)
- Sentiment analysis
- Content moderation
Keep on a frontier model ($2.50-$3.00/M input):
- Multi-step reasoning chains
- Complex code generation
- Long-form content where quality is critical
- Vision and multimodal tasks
How Much Are Companies Spending?
Average monthly AI spend jumped from $63,000 to $85,500 — a 36% increase YoY (CloudZero, 2025). And 45% of organizations plan to spend over $100,000/month. Only 51% can confidently evaluate their AI ROI.
Our finding: The teams spending the most aren't the ones with the most sophisticated AI. They're the ones who shipped early, never revisited model selection, and let usage scale on autopilot. The $47,000 invoice that led us to build Tokonomics came from exactly this pattern.
The Fix: Route, Cache, Cap
1. Route calls to the right model
Tag every API call by task type, then route:
- Classification → GPT-4o-mini or DeepSeek V3
- Conversational support → Claude Haiku 3.5
- Complex reasoning → GPT-4o or Claude Sonnet 4
If 60% of calls shift to a budget model, that's ~$1,950/month saved on a $3,250 bill.
2. Enable prompt caching
Anthropic's prompt caching saves 90% on cached tokens. OpenAI's automatic caching saves 50% with zero code changes.
3. Set hard spending caps
A monthly budget cap that blocks API calls when hit — not an alert you'll read at 9 AM, a hard block that stops bleeding at 3 AM.
The compounding effect
- Model routing alone: 50-70% savings
- Add prompt caching: another 30-50%
- Add budget caps: prevents 100% overruns
A team at $3,250/month can land at $300-$650/month with the same output quality.
Try It Yourself
bash
curl https://tokonomics.ca/proxy/openai/chat/completions \
-H "Authorization: Bearer mk_your_metering_key_here" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'
Top comments (0)