A prompt that costs $30 on GPT-5.5 costs $0.28 on DeepSeek V4 Flash. That's a 100x difference — and it's real.
If you're building on AI APIs, the pricing landscape in 2026 is more fragmented than ever. Four major providers, twenty-plus models, and pricing tiers that include cache reads, cache writes, batch discounts, promotional pricing, and hidden thresholds. I built a token cost calculator to make sense of it. This is the pricing data behind it.
All prices are per million tokens (MTok) in USD, sourced from official provider docs as of May 2026.
Every Model, Ranked by Price
Here's the full picture — all 20 models from cheapest to most expensive on input:
| # | Model | Provider | Input | Output | Ratio |
|---|---|---|---|---|---|
| 1 | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 4x | |
| 2 | DeepSeek V4 Flash | DeepSeek | $0.14 | $0.28 | 2x |
| 3 | GPT-5.4 Nano | OpenAI | $0.20 | $1.25 | 6.3x |
| 4 | Gemini 2.5 Flash | $0.30 | $2.50 | 8.3x | |
| 5 | DeepSeek V4 Pro* | DeepSeek | $0.435 | $0.87 | 2x |
| 6 | Gemini 3 Flash | $0.50 | $3.00 | 6x | |
| 7 | GPT-5.4 Mini | OpenAI | $0.75 | $4.50 | 6x |
| 8 | Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 5x |
| 9 | Gemini 2.5 Pro | $1.25 | $10.00 | 8x | |
| 10 | Gemini 3.1 Pro | $2.00 | $12.00 | 6x | |
| 11 | GPT-5.4 | OpenAI | $2.50 | $15.00 | 6x |
| 12 | Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 5x |
| 13 | Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | 5x |
| 14 | Gemini 3 Deep Think | $4.00 | $24.00 | 6x | |
| 15 | GPT-5.5 | OpenAI | $5.00 | $30.00 | 6x |
| 16 | Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 5x |
| 17 | GPT-5.4 Pro | OpenAI | $21.00 | $168.00 | 8x |
| 18 | GPT-5.5 Pro | OpenAI | $30.00 | $180.00 | 6x |
* DeepSeek V4 Pro: 75% promotional discount until May 31, 2026.
The Ratio column is output-to-input price. DeepSeek's 2x ratio means output tokens are proportionally much cheaper — important if your app generates long responses.
Head-to-Head: Same Tier, Different Price
Frontier models (best capability)
| Model | Input | Output | Monthly (10K req/day)* |
|---|---|---|---|
| Gemini 3.1 Pro | $2.00 | $12.00 | $3,900 |
| Claude Opus 4.7 | $5.00 | $25.00 | $6,375 |
| GPT-5.5 | $5.00 | $30.00 | $7,500 |
*5K input + 500 output tokens per request
Gemini 3.1 Pro is 2.5x cheaper than GPT-5.5 on input. But it doubles pricing for prompts over 200K tokens — a hidden cost that catches people off guard.
Mid-tier (best balance)
| Model | Input | Output | Monthly (10K req/day) |
|---|---|---|---|
| Gemini 2.5 Pro | $1.25 | $10.00 | $3,375 |
| GPT-5.4 | $2.50 | $15.00 | $6,000 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $6,375 |
Budget (cheapest)
| Model | Input | Output | Monthly (10K req/day) |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $30 |
| DeepSeek V4 Flash | $0.14 | $0.28 | $71 |
| GPT-5.4 Nano | $0.20 | $1.25 | $75 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $300 |
Caching: The Biggest Cost Lever
If your app sends the same system prompt or tool definitions repeatedly, caching matters more than base pricing. All providers offer ~90% savings on cached tokens, except DeepSeek which offers 98-99%.
The catch: Anthropic charges a 25% premium on cache writes. You pay $6.25/M instead of $5.00 the first time Opus processes a prefix. This means caching only saves money if you send the same prefix 3+ times within the cache TTL window. OpenAI and Google don't charge this premium — they just give you the discount.
For a detailed breakdown, see How to Save 90% on AI API Costs with Prompt Caching.
When to Go Cheap (and When Not To)
Use a budget model when:
- The task is well-defined (extract, classify, summarize)
- You need high throughput
- Output quality has a clear "good enough" threshold
- You're building a pipeline where a cheap model handles 90% of cases
Stick with a frontier model when:
- The task requires multi-step reasoning
- Accuracy is critical and errors are costly
- You need production-quality code generation
- The model is your product, not a utility
The smartest architecture routes 90% of traffic to a $0.10/M model and reserves the $5.00/M model for the 10% that actually needs it.
Bottom Line
AI API pricing has collapsed. The gap between the cheapest and most expensive models is 300x on input and 450x on output. The key is matching the model to the task. Don't pay GPT-5.5 prices to classify emails. Don't use Flash-Lite to write complex code. Use caching aggressively, pick the right tier, and your API bill drops from a line item to a rounding error.
Full pricing tables for all 20+ models, including cache write/read tiers, batch pricing, and provider-specific notes: Complete API Pricing Comparison
I built tokencostcalc.com — a free token cost calculator. No ads, no affiliate links, no tracking. Just pick a model, enter your token usage, and see the actual cost.
Top comments (0)