ww-w.ai

Posted on May 20

Google I/O Review (1/5) — Gemini 3.5 'Flash' Costs 15x More Than Flash 2.0. It's Pro in Disguise

#ai #google #llm #pricing

Gemini 3.5 "Flash" Costs 15x More Than Flash 2.0 — It's Pro in Disguise

Google I/O 2026 Review — Part 1 of 5

The keynote crowd cheered. Sundar Pichai announced that Gemini 3.5 Flash outperforms Gemini 3.1 Pro on multiple benchmarks. The narrative was clean: the lightweight, cheap model just beat the flagship. The start of "the agentic Gemini era."

Then I opened the pricing page.

Flash and Pro Are Neighbors Now

Model	Input (per 1M tokens)	Output (per 1M tokens)
Gemini 3.5 Flash	$1.50	$9.00
Gemini 3.1 Pro	$2.00	$12.00

Source: Google AI pricing, accessed 2026-05-19.

Flash at $1.50/$9.00. Pro at $2.00/$12.00. That is a 25% gap on input, 25% on output. These are not different tiers. They are neighbors. Two years ago, Flash cost a fraction of Pro. Now they share the same block.

If someone showed you these two price points without labels, you would guess they are variants of the same model class. You would be right.

How Flash Got Here: Three Generations of Price Creep

Model	Input (per 1M tokens)	Output (per 1M tokens)	vs 2.0 Flash (Input)	vs 2.0 Flash (Output)
1.5 Flash	$0.075	$0.30	0.75x	0.75x
2.0 Flash	$0.10	$0.40	1x (baseline)	1x (baseline)
2.5 Flash	$0.30	$2.50	3x	6.25x
3.0 Flash	$0.50	$3.00	5x	7.5x
3.5 Flash	$1.50	$9.00	15x	22.5x

Source: Google AI pricing. All prices are standard (non-batch) per 1M tokens.

From 2.0 Flash to 3.5 Flash: input price rose 15x ($0.10 to $1.50). Output price rose 22.5x ($0.40 to $9.00). A model called "Flash" now costs fifteen times what Flash cost three generations ago.

The trajectory is clear. Flash did not stay in the lightweight lane. It grew into the price range that Pro used to occupy.

The Name Didn't Change. The Economics Did.

Here is what I think actually happened: Google shipped Pro-level performance and put the Flash label on it.

The benchmarks are real. Flash 3.5 does outperform Pro 3.1 on the metrics Google showed. But outperforming Pro while costing nearly the same as Pro is not "the cheap model won." It is "the expensive model got a new name."

Think about it from Google's side. If they had called it Pro 3.5 at $1.50/$9.00, the story would be: "Google cut Pro pricing by 25%." Accurate, useful, but not a keynote moment. By calling it Flash, the story becomes: "Flash beat Pro!" That is a keynote moment. Same product economics, different narrative.

Pichai himself leaned into the framing. He used the word "tokenmaxxing" during the keynote — more tokens, more context, more throughput. Some out there might call this tokenmaxxing, he said. The naming is part of that narrative. Flash sounds lightweight and affordable. The pricing page tells a different story.

So Is This Bad? Not Exactly.

I want to be fair. The absolute price matters more than the brand name.

Pro-level performance at $1.50/$9.00 is genuinely useful. Consider an agent workload — a customer support bot handling 50,000 conversations per day. At legacy Pro pricing ($2.00/$12.00), the daily output token cost for, say, 500 tokens per response is:

50,000 conversations x 500 output tokens = 25M output tokens/day
At Pro 3.1: 25 x $12.00 = $300/day
At Flash 3.5: 25 x $9.00 = $225/day

That is $75/day saved, or roughly $2,250/month — with the same or better benchmark performance. For agent-heavy workloads running at scale, this price point opens real economic headroom.

The win is not that "Flash beat Pro." The win is that Pro-grade inference got 25% cheaper. That is a quieter story, but a more honest one.

Benchmarks vs. Production: The Usual Caveat

One thing the keynote did not cover: benchmark performance and production performance are different conversations. Benchmarks test isolated capabilities — reasoning, coding, knowledge retrieval — under controlled conditions. Production workloads add latency variance, context window pressure, tool-call chains, and failure modes that benchmarks do not measure.

I have not tested Flash 3.5 in production yet. Nobody outside Google has had enough time to. If you are making infrastructure decisions based on the keynote benchmarks alone, you are making them on incomplete data. Wait for the community benchmarks. Wait for your own evals.

Gemma 4: A Quick Note from Local Testing

On a related note — I have been running Gemma 4 (2.3B) locally for on-device-llm-wiki, a zero-cost, fully offline knowledge engine. In our internal reasoning benchmark across on-device and cloud models, Gemma 4 scored 66/85 — outperforming Granite 3.4B (52), Qwen3 4B (28), and SmolLM2 1.7B (35). For reference, Claude Haiku 4.5 scored 76. A free, local 2B model reaching 87% of a commercial cloud model's reasoning score — while beating a 4B competitor by more than 2x — is not incremental. It is a generational leap.

If Flash 3.5 carries the same generational improvement at cloud scale, the performance claims are plausible. Gemma is the open-weight sibling of the Gemini family, and quality gains in one tend to reflect in the other. But plausible is not confirmed — that requires production testing, not keynote slides.

What I Think You Should Do

Read the pricing page, not the keynote. The pricing page is the source of truth. Marketing narratives are not.
Run your own evals. If you are considering Flash 3.5 for production, test it on your workloads. Benchmark suites test what benchmark suites test.
Compare to the actual competition. Flash 3.5 at $1.50/$9.00 competes with Claude Sonnet 4 ($3/$15), GPT-4.1 ($2/$8), and other mid-to-high tier models. Compare apples to apples at the price point, not at the brand name.
Track the trajectory. Flash went from $0.10/$0.40 to $1.50/$9.00 in three generations. If the pattern holds, Flash 4.0 will cost what Pro costs today. Plan accordingly.

The Bottom Line

Google told a story about the cheap model beating the expensive one. The pricing page tells a story about the expensive model getting a cheaper name. Both stories have truth in them. The benchmarks are real. The price convergence is real. Which story matters more depends on what you are building.

For me, the useful takeaway is simpler: Pro-level performance is now available at $1.50/$9.00. That is good for anyone running agents at scale. Just do not call it cheap — it is 15x more expensive than the Flash you remember.

This is Part 1 of a 5-part Google I/O 2026 review series. Next up: Managed Agents API — serverless agents arrive, but so does GCP lock-in.

If you have tested Flash 3.5 against Pro on your own workloads, I would like to hear the numbers. Drop a comment or find me on GitHub.

Sources:

DEV Community