The State of LLM API Pricing: July 2026

#ai #llm #openai #api

Originally published on the TierUp blog.

If you last looked at a model price sheet a year ago, the single most important thing that changed isn't any one number. It's the spread. As of this month, published per-token prices run from about $0.075 per million input tokens at the bottom (Gemini 2.5 Flash-Lite, per APIpulse's June 2026 survey) to $30 input / $180 output at the top (OpenAI's GPT-5.5 Pro tier, confirmed across APIpulse, CloudZero, and CostGoat).

That's roughly a 400x spread on input and a 600x spread on output. Two API calls that look identical in your code can differ in cost by more than two orders of magnitude depending on one string: the model name.

The landscape in one table

Prices below are per million tokens, cross-checked against three trackers updated between May 11 and July 5, 2026. Prices move; verify against the provider's page before committing budget.

Model	Input $/M	Output $/M
GPT-5.5 Pro	$30.00	$180.00
Claude Opus 4.7	$5.00	$25.00
GPT-5.5	$5.00	$30.00
Claude Sonnet 4.6	$3.00	$15.00
Gemini 3.1 Pro (≤200K context)	$2.00	$12.00
Claude Haiku 4.5	$1.00	$5.00
Gemini 3 Flash	$0.50	$3.00
Gemini 2.5 Flash-Lite	$0.075	$0.30

A few footnotes that matter more than they look:

Long context costs extra. Gemini 3.1 Pro doubles its input rate (to $4/M) and raises output to $18/M once you cross 200K tokens of context, per CloudZero's data.
Naming churn is real. CloudZero's May snapshot listed the $30/$180 OpenAI tier as "GPT-5.4 Pro"; APIpulse and CostGoat now list "GPT-5.5 Pro" at the identical price. The tier is stable even when the model name isn't — plan around tiers, not names.
Open-weight-hosted models anchor the floor. DeepSeek's models are listed at $0.27/$1.10 (V3.2, CloudZero) down to $0.14/$0.28 for newer flash variants (APIpulse). The budget floor is crowded and keeps dropping.

What the spread actually means for you

The middle tier is where most production work belongs. Claude Sonnet 4.6 ($3/$15) and GPT-5.4 ($2.50/$15) are the consensus workhorses in every tracker we checked — frontier-adjacent quality at roughly 1/12th the cost of the Pro tiers. The $30/$180 tier buys measurably better performance on hard reasoning, but at 12x the price of models that handle the large majority of real workloads fine.

Output pricing is the quiet killer. Every model in the table charges 4–6x more for output than input. If your workload is generation-heavy (long answers, code, reports), the output column is the one to optimize — a topic big enough that we wrote a separate post on hidden cost multipliers.

Discounts are large and underused. Batch APIs run 50% off and prompt caching discounts cached input by up to 90% at the major providers, per CloudZero. If you're paying rack rate on repetitive prefixes, you're overpaying by design.

The uncomfortable implication

A 400–600x price spread means model selection is now a bigger cost lever than any infrastructure decision most teams will make this year. Hardcoding a flagship model name into every call path was defensible when the spread was 10x. At 600x, it's a budget decision being made by a config file nobody has reviewed since March.

The practical move: classify your workloads by the quality they actually need, route each class to the cheapest tier that clears the bar, and re-check quarterly — because as the naming churn above shows, the map gets redrawn every few months. That's the exact problem TierUp's tier-based routing exists to automate — disclosure: I'm the founder, and the tier-1 free playground at tierup.ai/try needs no signup if you want to see it.

DEV Community

The State of LLM API Pricing: July 2026

The landscape in one table

What the spread actually means for you

The uncomfortable implication

Sources

Top comments (0)