Zouhair Ait Oukhrib

Posted on Jun 10

We Tracked 1M LLM API Calls — 60% Were Wasting Money on the Wrong Model

#ai #llm #saas #devops

Key Takeaways

82% of developers default to OpenAI GPT models (Stack Overflow Developer Survey, 2025), but 60-70% of production API calls don't need a frontier model.

Switching classification calls from GPT-4o to DeepSeek V3 saves 18x on input tokens ($2.50 → $0.14 per million).

Combining model routing with prompt caching cuts total LLM spend by 80-95%.

Average monthly AI spend hit $85,500 per company in 2025 — a 36% jump YoY (CloudZero, 2025).

Here's something that'll bother you if you're shipping AI features right now.

We looked at the first million API calls that came through Tokonomics — across 47 tenants, 9 providers, dozens of models. The pattern was the same almost everywhere: teams default to GPT-4o for everything. Customer support chatbots? GPT-4o. JSON extraction? GPT-4o. Classification into 5 categories? GPT-4o.

The waste isn't theoretical. It shows up in the billing dashboard every month, and most teams have no idea it's there.

Why Do 82% of Developers Default to GPT-4o?

Stack Overflow's 2025 Developer Survey found that 82% of developers use OpenAI GPT models. That makes GPT-4o the de facto standard.

It makes sense. OpenAI has the best docs. Every tutorial uses GPT-4o. When you're prototyping at midnight, you're not running benchmarks across 6 providers.

But prototyping habits become production costs. That model you picked in February is still running in June, processing 50,000 calls a day, and nobody's asked whether a $0.14/M model would give the same result as a $2.50/M model.

Our finding: Our own internal chatbot ran on GPT-4o for three months before anyone checked. Switching the FAQ portion to GPT-4o-mini cut that component's cost by 94% with no quality difference.

What Does Model Selection Actually Cost?

Here's what 1 million requests cost (500 input + 200 output tokens per call):

Model	Monthly Cost
GPT-4o	$3,250
Claude Sonnet 4	$4,500
Claude Haiku 3.5	$1,200
GPT-4o-mini	$195
DeepSeek V3	$126
GPT-4.1 Nano	$130

That's a 25x cost difference between GPT-4o and GPT-4.1 Nano. For the same million requests.

Which Calls Don't Need a Frontier Model?

60-70% of API calls in typical SaaS apps are simple enough for budget models (Prem AI, 2026):

Send to a budget model ($0.10-$0.80/M input):

Intent classification
JSON/structured data extraction
Short summaries (under 200 words)
Sentiment analysis
Content moderation

Keep on a frontier model ($2.50-$3.00/M input):

Multi-step reasoning chains
Complex code generation
Long-form content where quality is critical
Vision and multimodal tasks

How Much Are Companies Spending?

Average monthly AI spend jumped from $63,000 to $85,500 — a 36% increase YoY (CloudZero, 2025). And 45% of organizations plan to spend over $100,000/month. Only 51% can confidently evaluate their AI ROI.

Our finding: The teams spending the most aren't the ones with the most sophisticated AI. They're the ones who shipped early, never revisited model selection, and let usage scale on autopilot. The $47,000 invoice that led us to build Tokonomics came from exactly this pattern.

The Fix: Route, Cache, Cap

1. Route calls to the right model

Tag every API call by task type, then route:

Classification → GPT-4o-mini or DeepSeek V3
Conversational support → Claude Haiku 3.5
Complex reasoning → GPT-4o or Claude Sonnet 4

If 60% of calls shift to a budget model, that's ~$1,950/month saved on a $3,250 bill.

2. Enable prompt caching

Anthropic's prompt caching saves 90% on cached tokens. OpenAI's automatic caching saves 50% with zero code changes.

3. Set hard spending caps

A monthly budget cap that blocks API calls when hit — not an alert you'll read at 9 AM, a hard block that stops bleeding at 3 AM.

The compounding effect

Model routing alone: 50-70% savings
Add prompt caching: another 30-50%
Add budget caps: prevents 100% overruns

A team at $3,250/month can land at $300-$650/month with the same output quality.

Try It Yourself


bash
curl https://tokonomics.ca/proxy/openai/chat/completions \
  -H "Authorization: Bearer mk_your_metering_key_here" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'

DEV Community