AI model pricing in 2026 is chaos. GPT-5 Nano costs $0.05 per million tokens. Claude Opus 4.6 costs $25 per million output tokens. That's a 500x price difference.
Which one should you use? It depends on what you're doing — and most developers are massively overpaying because they don't know the options.
This is the guide I wish I had when I started building with AI APIs.
The Full Pricing Table (February 2026)
Sorted cheapest to most expensive by output price:
| Model | Provider | Input/1M | Output/1M | Context | Best For |
|---|---|---|---|---|---|
| GPT-5 Nano | OpenAI | $0.05 | $0.40 | 400K | Classification, extraction, simple Q&A |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Fast tasks, agentic workflows | |
| GPT-4o-mini | OpenAI | $0.15 | $0.60 | 128K | Budget all-rounder |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1M | Fast reasoning, thinking mode available | |
| Mistral Small 3.1 | Mistral | $0.20 | $0.60 | 128K | Latency-sensitive tasks, open-source |
| Llama 4 Maverick | Meta | ~$0.15 | ~$0.60 | 1M | Open-source, self-hostable |
| DeepSeek V3 | DeepSeek | $0.56 | $1.68 | 128K | Coding, general purpose, incredible value |
| DeepSeek R1 | DeepSeek | $0.56 | $1.68 | 128K | Chain-of-thought reasoning, math |
| GPT-5 Mini | OpenAI | $0.25 | $2.00 | 400K | Budget GPT-5 tier |
| Mistral Medium 3 | Mistral | $0.40 | $2.00 | 128K | Balanced cost/quality |
| o3-mini | OpenAI | $1.10 | $4.40 | 200K | Reasoning on a budget |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 200K | Fast classification, support bots |
| Mistral Large 3 | Mistral | $2.00 | $6.00 | 256K | Open-source (Apache 2.0), multilingual |
| o3 | OpenAI | $2.00 | $8.00 | 200K | Full reasoning |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | 1M | Coding, instruction following |
| GPT-4o | OpenAI | $2.50 | $10.00 | 128K | Multimodal all-rounder |
| GPT-5 | OpenAI | $1.25 | $10.00 | 400K | Broad intelligence |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Deep reasoning, coding, math | |
| GPT-5.2 | OpenAI | $1.75 | $14.00 | 400K | Latest OpenAI flagship |
| Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | 200K | Balanced intelligence/cost |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | 200K | Most capable, deep analysis |
Prices as of February 2026. All prices per 1 million tokens.
The Insight Most Developers Miss
Look at that table again. The cheapest model (GPT-5 Nano at $0.40/M output) and the most expensive (Claude Opus 4.6 at $25/M output) differ by 62.5x.
Now here's the kicker: for "What's the capital of France?", they give the same answer.
For "Translate 'hello' to French", same answer.
For "Summarize this email", functionally equivalent.
70% of typical AI API traffic is simple tasks like these. That means 70% of your budget is being wasted on a 62x premium you don't need.
Cost Breakdown by Use Case
Let's get specific. Here's what common tasks actually cost across different model tiers:
Simple Tasks (70% of typical traffic)
| Task | Cheap Model | Price | Expensive Model | Price | Overpay Factor |
|---|---|---|---|---|---|
| "Translate to French: Hello" | GPT-5 Nano | $0.0001 | Claude Opus 4.6 | $0.006 | 60x |
| "What's the capital of France?" | Gemini 2.0 Flash | $0.0001 | GPT-4o | $0.003 | 30x |
| "Summarize: [200-word email]" | Gemini 2.5 Flash | $0.0002 | Claude Sonnet 4.5 | $0.005 | 25x |
| Sentiment analysis | GPT-5 Nano | $0.0001 | GPT-5.2 | $0.004 | 40x |
| Spell check / grammar | Mistral Small | $0.0002 | Claude Haiku 4.5 | $0.002 | 10x |
Medium Tasks (20% of typical traffic)
| Task | Mid-tier Model | Price | Frontier Model | Price | Overpay Factor |
|---|---|---|---|---|---|
| Code generation (function) | DeepSeek V3 | $0.005 | Claude Opus 4.6 | $0.08 | 16x |
| Content writing (blog intro) | Gemini 2.5 Pro | $0.03 | GPT-5.2 | $0.04 | 1.3x |
| Data analysis (CSV) | GPT-5 Mini | $0.006 | Claude Sonnet 4.5 | $0.05 | 8x |
Complex Tasks (10% of typical traffic)
| Task | Frontier Model | Price | Notes |
|---|---|---|---|
| Research paper analysis | Claude Opus 4.6 | $0.10 | Worth the premium |
| Architecture design | GPT-5.2 | $0.05 | Complex reasoning needed |
| Multi-step debugging | Gemini 2.5 Pro | $0.04 | Thinking mode helps |
Monthly Cost Scenarios
Scenario 1: Customer Support Bot (10K conversations/month)
| Approach | Monthly Cost |
|---|---|
| Everything on Claude Opus 4.6 | $250 |
| Everything on GPT-4o | $100 |
| Everything on Gemini 2.5 Flash | $6 |
| Smart routing (70/20/10 split) | $8 |
Scenario 2: Content Generation App (50K requests/month)
| Approach | Monthly Cost |
|---|---|
| Everything on GPT-5.2 | $700 |
| Everything on Claude Sonnet 4.5 | $750 |
| Smart routing (60/30/10 split) | $95 |
Scenario 3: Code Assistant (25K requests/month)
| Approach | Monthly Cost |
|---|---|
| Everything on Claude Opus 4.6 | $625 |
| Everything on DeepSeek V3 | $42 |
| Smart routing (50/30/20 split) | $55 |
The New Players You Should Know
GPT-5 Nano — The Budget King
Released August 2025. $0.05 input, $0.40 output per million tokens. 400K context window. This model handles classification, extraction, simple Q&A, and formatting at essentially zero cost. If you're not using this for simple tasks, you're leaving money on the table.
Gemini 2.5 Pro — The Thinking Value Play
$1.25 input, $10 output per million tokens. But here's the key: thinking tokens are billed at the same rate as regular output. No hidden surcharge for reasoning. With a 1M context window, this is arguably the best value for complex tasks.
DeepSeek V3/R1 — The Open-Source Disruptor
$0.56 input, $1.68 output. But with cache hits: $0.07 input. That's insanely cheap for a model that competes with GPT-4o on benchmarks. Open-source, too.
Llama 4 Scout — The Context Monster
10 million token context window. Open-source. Self-hostable. If you're processing massive documents, this changes the economics completely.
Claude Opus 4.6 — The Premium Standard
Released February 5, 2026. $5/$25 per million tokens. Expensive, but Anthropic cut prices 67% from Opus 4.1 ($15/$75). Best for financial analysis, agentic coding, and genuinely complex reasoning where quality matters more than cost.
How to Actually Optimize
You have three options:
Option 1: Manual Model Selection
Pick the cheapest model that works for your use case. Test it. If quality is good enough, ship it. Simple, free, and works for single-use-case apps.
Best for: Apps with one dominant task type.
Option 2: Build Your Own Router
Write a classifier that routes by task complexity. I published a 50-line Python version that catches ~60% of optimization. Maintain it yourself.
Best for: Teams with engineering time who want full control.
Option 3: Use a Routing Service
Change your base URL and let a routing layer handle model selection automatically. Services like Komilion (which I built — disclosure) route across 400+ models based on task complexity.
from openai import OpenAI
client = OpenAI(
base_url="https://www.komilion.com/api/v1",
api_key="ck_your_key"
)
# This query costs $0.0002 (Flash model)
response = client.chat.completions.create(
model="neo-mode/balanced",
messages=[{"role": "user", "content": "Translate to French: Hello world"}]
)
# This query costs $0.05 (Opus-tier model)
response = client.chat.completions.create(
model="neo-mode/balanced",
messages=[{"role": "user", "content": "Architect a CQRS event-sourcing system for a payment platform"}]
)
Best for: Teams that want savings without building infrastructure.
The API Gateway Landscape
If you're considering a routing/gateway service, here's the current landscape:
| Service | Models | Pricing | Smart Routing | Self-Host |
|---|---|---|---|---|
| OpenRouter | 400+ | 5.5% on credits | Rule-based fallbacks | No |
| Portkey | 1,600+ | $49/mo platform fee | Rule-based + guardrails | Yes |
| LiteLLM | 100+ providers | Free (open source) | Rule-based algorithms | Yes |
| Unify | All major | $40/seat/mo | Benchmark-driven + custom | No |
| Martian | 200+ | Opaque pricing | ML-based (most sophisticated) | No |
| Komilion | 394 | Pay-as-you-go | Benchmark + LLM classifier | No |
OpenRouter is the largest catalog with pass-through pricing. No routing intelligence though — you still pick the model.
LiteLLM is the open-source champion. Free, self-hosted, maximum control. But requires DevOps expertise.
Portkey is the enterprise play. 1,600+ models, SOC2/HIPAA, full observability. $49/month minimum.
Unify does benchmark-driven routing with trainable custom models. Best for evaluation workflows.
Martian has the most sophisticated ML-based routing, but pricing is opaque and adoption is limited.
Komilion (mine) sits in between — automatic per-query routing without ML complexity, transparent pricing, OpenAI SDK compatible.
Key Takeaways
The price floor has collapsed. GPT-5 Nano and Gemini 2.0 Flash deliver usable quality at $0.40/M output tokens. Use them for simple tasks.
Stop using one model for everything. The difference between the cheapest and most expensive model is 62x. Match the model to the task.
Context windows have exploded. Llama 4 Scout: 10M tokens. GPT-4.1 and Gemini: 1M. Factor this into your architecture.
Open-source is competitive. DeepSeek, Llama 4, and Mistral Large 3 compete with proprietary models at a fraction of the cost.
Batch API saves 50%. Both OpenAI and Anthropic offer 50% discounts for non-real-time processing. Use it for background tasks.
Cache aggressively. Anthropic's prompt caching gives 90% discount on cache reads. OpenAI gives 50-90%. If you're sending similar prompts, cache them.
Start Saving
The simplest thing you can do today: look at your API logs. Identify the simple queries. Switch them to a Flash model.
That alone could cut your bill by 50-70%.
If you want to automate it: komilion.com — free credits, no credit card. Or build your own router. Either way, stop sending "what's the capital of France?" to Claude Opus 4.6.
Hossein Shahrokni builds AI infrastructure tools. Follow him on Twitter @haboroshan for more AI cost optimization content.
Top comments (0)