Evan-dong

Posted on May 20

Gemini 3.5 Flash Just Shipped — Here's When to Use It (and When Not To)

#ai #claude #api #programming

Google launched Gemini 3.5 Flash at I/O 2026 today. The "budget" model now beats Gemini 3.1 Pro on agent and coding benchmarks. Here is what you actually need to know to decide whether to switch.

Quick specs

Model ID: gemini-3.5-flash
Context: 1M input / 65K output
Input: text, image, audio, video, PDF
Pricing: $1.50/M input, $9.00/M output, $0.15/M cached input
Knowledge cutoff: January 2026
Dynamic Thinking: on by default (medium), low mode available

Where Flash wins

Benchmark	Flash	3.1 Pro	Gap
Terminal-Bench 2.1 (coding)	76.2%	70.3%	+5.9
MCP Atlas (tool calling)	83.6%	78.2%	+5.4
Finance Agent v2	57.9%	43.0%	+14.9
GDPval-AA (decision-making)	1,656 Elo	1,314 Elo	+342
CharXiv Reasoning	84.2%	—	—

Where Flash does NOT win

Humanity's Last Exam: 3.1 Pro leads (44.4% vs 40.2%)
ARC-AGI-2: 3.1 Pro leads (77.1% vs 72.1%)
SWE-Bench Pro: Claude Opus 4.7 still leads
Computer Use: GPT-5.5 is the only model with production screen control. Flash does not support this.

Tool capabilities

What ships:

Function Calling
Structured Output
Search-as-a-tool
Code Execution

What is missing:

Computer Use — no screen control, no clicking, no form filling. If your agent operates a browser or desktop app, you still need GPT-5.5 for that part.

When to pick which model

Agent needs to call multiple tools in sequence?
  → Gemini 3.5 Flash

Agent needs to refactor code across a large repo?
  → Claude Opus 4.7

Agent needs to control a browser or desktop?
  → GPT-5.5

Need all three depending on the task?
  → Route through a unified gateway

API call example

Through EvoLink (single endpoint for Gemini, Claude, and GPT):

curl -X POST https://direct.evolink.ai/v1beta/models/gemini-3.5-flash:generateContent \
  -H "Authorization: Bearer YOUR_EVOLINK_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [{"text": "List the top 3 files changed in the last commit and explain what each change does"}]
    }]
  }'

If you were using any previous Gemini model through EvoLink, swap the model ID. No other change needed.

The benefit: same API key and endpoint for Flash, Opus 4.7, and GPT-5.5. Switch model IDs depending on the task. One bill, automatic failover.

Full docs: EvoLink Gemini 3.5 Flash API Guide

Things to keep in mind

These are Google's benchmarks. Independent community testing will confirm or adjust. Take the exact numbers with normal benchmark skepticism.
Agent cost is not unit cost. A 20-step agent loop means 20 API calls. Fast and cheap per token is not the same as cheap per workflow run.
Dynamic Thinking adds reasoning tokens. The model thinks before answering. This increases output quality but also output token count. Watch your bills.
3.5 Pro is expected next month. If you need peak general reasoning from Google, it might be worth waiting.

Sources

Pick models by task, not by tier. Flash for tool orchestration. Opus for code rewrites. GPT for screen control. That is the state of play.

tags: gemini, ai-agents, llm, api, google-io

DEV Community