💡 This is a cross-post from my AI Cost Calc blog. The original has the same content with linked tools — feedback welcome on either platform.
"Cheapest AI API" is a misleading question. The model that costs the least per token might be useless for your task — and the one that looks expensive might be 10× cheaper for what you actually use it for. So before we hand you the list, two caveats:
- Cost is meaningless without capability matching. A $0.20/1M model that gets 60% of your queries wrong is more expensive than a $5/1M model that nails them on the first try.
- Headline rates lie in 2026. Caching can cut bills by 90%. Batch API drops them 50%. The "cheapest" model on the price page might be the most expensive in production.
With those out of the way: here's the honest ranking by single-call cost (1,000 input + 500 output tokens) across 10 frontier and small models.
Methodology
Each cost figure is calculated as:
cost = (1,000 / 1,000,000) × input_price + (500 / 1,000,000) × output_price
Where input_price and output_price are the official 2026 published rates per 1M tokens. The numbers don't include caching or batch discounts — those are footnoted because they change the order substantially.
The Ranking
| Rank | Model | Provider | Per-call cost | Best for |
|---|---|---|---|---|
| 1 | GPT-5 mini | OpenAI | $0.0006 | Default everyday small |
| 2 | DeepSeek V4 | DeepSeek | $0.0009 | Coding, math, reasoning value |
| 3 | Gemini 3.0 Flash | $0.0013 | Multimodal at scale | |
| 4 | o4-mini | OpenAI | $0.0027 | STEM reasoning |
| 5 | Claude Haiku 4.5 | Anthropic | $0.0035 | Anthropic ecosystem, caching-heavy |
| 6 | Mistral Large 3 | Mistral | $0.0058 | EU hosting, multilingual |
| 7 | Gemini 3.0 Pro | $0.0075 | Long context (2M tokens) | |
| 8 | Grok 4 | xAI | $0.0140 | Real-time X integration |
| 9 | GPT-5.5 | OpenAI | $0.0150 | Frontier multimodal |
| 10 | Claude Opus 4.7 | Anthropic | $0.0525 | Hard reasoning, 1M context |
#1: GPT-5 mini ($0.0006/call)
OpenAI's small model is the new default for high-volume production. At $0.20 input / $0.80 output per 1M tokens, it's:
- 25× cheaper than GPT-5.5
- 60% cheaper than Haiku 4.5
- 30% cheaper than Gemini 3.0 Flash on output
Where it wins: chatbots, classification, function calling, vision tasks at moderate complexity. With prompt caching (cached input at $0.05/1M), volume workloads get even cheaper.
Where it loses: hard reasoning (use o4-mini instead), long context (use Gemini 3.0 Pro).
#2: DeepSeek V4 ($0.0009/call)
The most aggressive cost/quality story in 2026. DeepSeek V4 is an open-weight 1T-parameter MoE that punches at the level of US frontier models on coding and reasoning at 3% of GPT-5.5's price.
Trade-offs:
- China-based; some enterprises have data residency concerns
- Slightly weaker on creative writing and English nuance
- No vision (yet)
If you're cost-sensitive and your workload is coding, math, or reasoning-heavy, DeepSeek V4 is the rational pick.
#3: Gemini 3.0 Flash ($0.0013/call)
Google's high-throughput multimodal model:
- Native audio + vision (no separate model needed)
- 1M token context window
- Fast inference (multi-thousand tokens/sec)
- Caching support
For multimodal pipelines (image classification, audio summarization, document QA), Gemini 3.0 Flash is the sweet spot.
#4: o4-mini ($0.0027/call)
OpenAI's reasoning model. At $0.90 input / $3.60 output, it's 5× more expensive than GPT-5 mini but punches multiple weight classes above on:
- STEM problems (math, physics, chemistry)
- Multi-step coding refactors
- Logic puzzles requiring chain of thought
#5: Claude Haiku 4.5 ($0.0035/call)
Anthropic's small model is 3× more expensive than GPT-5 mini at face value — but with caching, the math inverts.
Haiku's cached input price is $0.10/1M (vs GPT-5 mini's $0.05). Both cheap. But Haiku's relative discount vs its standard input ($1.00) is 90% off — meaning for cache-heavy workloads, Haiku 4.5 becomes one of the cheapest models in the lineup.
Classic example — chatbot with 2,000-token system prompt called millions of times. With 95% cache hit rate:
- Standard cost: $1.90 per 1,000 calls
- With caching: ~$0.30 per 1,000 calls
#6-#7: Mid-tier flagships
Mistral Large 3 ($0.0058) and Gemini 3.0 Pro ($0.0075) sit in an awkward middle: more expensive than the small models but considerably cheaper than the absolute frontier.
Mistral Large 3: Best for EU customers. Multilingual is its strongest pitch — 30+ European languages natively.
Gemini 3.0 Pro: The 2M token context is unmatched. For book-length analysis or whole-codebase review, it's the only practical option.
#8-#9: Premium flagships
Grok 4 ($0.0140) is the wildcard with real-time X integration. Premium price reflects this niche feature.
GPT-5.5 ($0.0150) is the all-rounder frontier. Best ecosystem support, best tooling, best documentation.
#10: Claude Opus 4.7 ($0.0525/call)
The most expensive model on this list — by a significant margin. 3.5× more expensive per call than GPT-5.5.
So why use it?
- Hard reasoning: Claude Opus consistently leads on multi-step coding, agentic workflows, complex analysis.
- 1M token context with cleaner long-context attention than alternatives.
- Caching changes everything: Opus 4.7's cached read price is $1.50/1M — the same as GPT-5.5's standard input. With heavy caching, Opus's effective cost drops dramatically.
What changes the order?
The ranking above is naive single-call cost. Three things substantially change which model is actually cheapest for your use case:
1. Caching ratio
If 80% of your input is cached (typical RAG application):
| Model | Naive cost | With 80% caching | Order shift |
|---|---|---|---|
| GPT-5 mini | $0.0006 | $0.00048 | unchanged |
| Claude Haiku 4.5 | $0.0035 | $0.00094 | jumps from #5 to #2 |
| Claude Opus 4.7 | $0.0525 | $0.0156 | jumps from #10 to #5 |
2. Output ratio
If you're generating long content (output >> input), output prices dominate. Models with cheap output (Gemini 3.0 Flash $2/1M, GPT-5 mini $0.80/1M) become disproportionately cheaper.
3. Batch eligibility
If your workload tolerates 24-hour async processing, Batch API discounts cut all OpenAI / Anthropic / Google rates by 50%.
How to actually pick a model
Practical decision tree:
- Complex reasoning? → o4-mini for cost, Opus 4.7 for quality
- Context > 200K tokens? → Gemini 3.0 Pro
- Cache-heavy with stable prompts? → Haiku 4.5 (best cache discount)
- Batchable (non-realtime)? → Anything with batch + 50% off
- Default high-volume simple? → GPT-5 mini or Gemini 3.0 Flash
- EU hosting? → Mistral Large 3
- Cost is only concern? → DeepSeek V4
Calculate your real cost
The ranking above assumes 1,000 input + 500 output tokens. Your workload is different.
I built a free calculator at aicostcalc.net that handles all 10 models with caching/batch toggles. Plug in your token counts and the cheapest pick for your case appears at the top.
If you're spending more than $500/month on AI APIs and haven't run this exercise, you're almost certainly leaving 30-60% on the table.
More reading on this topic:
- OpenAI API Pricing Explained: Complete Guide for 2026
- Claude API Pricing in 2026
- How to Calculate Token Cost: A Beginner's Guide
Source code on GitHub (MIT). Feedback / pricing corrections welcome via issues.
Top comments (0)