Google shipped Gemini 3.5 Flash on May 19, 2026. It is the fast, low-cost model in the Gemini 3.5 family and the only 3.5 variant available today. Gemini 3.5 Pro is announced for June 2026, but Flash is the model you can actually build with right now.
Gemini 3.5 Flash targets production workloads that need speed, long context, and reliable tool use: agent loops, terminal automation, multi-file coding, multimodal document analysis, and streaming chat. Google positions it as roughly 4× faster on output tokens than other frontier models and less than half the cost per task.
This guide covers what changed, where Flash performs well, how to access it, and how to test Gemini API endpoints with Apidog.
Quick facts about Gemini 3.5 Flash
| Item | Details |
|---|---|
| Release date | May 19, 2026 |
| Variant | Gemini 3.5 Flash |
| Pro availability | Gemini 3.5 Pro announced for June 2026 |
| Context window | 1M input tokens, 64K output tokens |
| Modalities | Text, images, code, graphics generation |
| API model name | gemini-3.5-flash |
| Speed | ~4× faster output tokens/second than other frontier models |
| Cost | Less than half the cost of comparable frontier models for agentic tasks |
| Access | Gemini app, AI Mode in Search, Google Antigravity, Gemini API, AI Studio, Android Studio, Gemini Enterprise |
Headline benchmark numbers from Google:
- Terminal-Bench 2.1: 76.2%
- CharXiv Reasoning: 84.2%
- MCP Atlas: 83.6%
- GDPval-AA: 1656 Elo
For cost details, free-tier limits, and usage scenarios, see the Gemini 3.5 Flash pricing guide.
What changed from Gemini 3 and 3.1
Gemini 3.5 Flash builds on Gemini 3 Flash and Gemini 3.1 Pro with five practical upgrades.
1. Better long-running agent execution
Flash is designed for longer chains of work. In practice, that matters when your agent needs to:
- Call tools in the correct order
- Recover from tool errors
- Dispatch work to subagents
- Keep state across a long task
2. Stronger coding output
The useful upgrade is not only code completion. Flash is better suited for workflows like:
- Multi-file refactors
- CLI-based development tasks
- Long-horizon debugging
- Repository-level code changes
3. Built-in graphics generation
Flash can generate inline diagrams, SVG, and interactive web UI output directly. That reduces the need to route every visual task through a separate image model.
4. Faster streaming output
Google claims roughly 4× the output-token speed of other frontier models. If you are building a streaming UI, this affects implementation details:
- Render partial output incrementally
- Add backpressure if your frontend cannot render fast enough
- Measure time-to-first-token and full completion latency separately
5. Wider safety guardrails
Google describes stronger safeguards for cyber and CBRN misuse, plus interpretability tools that explain why the model refused or rerouted a request.
The overall direction is clear: Flash is optimized for production agent workloads, similar to how OpenAI and Anthropic position GPT-5.5 and Claude Opus 4.7.
Gemini 3.5 Flash benchmarks
Google’s published benchmark table shows Flash performing especially well on long-context, tool-heavy, and multimodal reasoning tasks.
| Benchmark | What it tests | Gemini 3.5 Flash |
|---|---|---|
| Terminal-Bench 2.1 | Long-horizon CLI workflows | 76.2% |
| MCP Atlas | Multi-tool coordination | 83.6% |
| CharXiv Reasoning | Chart and diagram interpretation | 84.2% |
| GDPval-AA | General agentic value | 1656 Elo |
| MRCR v2, 1M context | Long-context retrieval | Top of Google’s table |
Flash is strongest in:
- Chart reasoning
- Agentic multi-tool workflows
- Long-context retrieval
- Cost-sensitive long-running tasks
It does not dominate every category. For pure SWE-Bench Verified-style bug fixing, Opus 4.7 and GPT-5.5 remain competitive. If your workload is single-shot bug fixing, benchmark before switching. If your workload is long agent execution at lower cost, Flash is more compelling.
For a workload-by-workload comparison, see Gemini 3.5 Flash vs GPT-5.5 vs Opus 4.7.
The Gemini 3.5 model family
Gemini 3.5 Flash
Flash is available now through:
- AI Studio
- Gemini API
- Gemini app
- AI Mode in Search
- Google Antigravity
- Android Studio
- Gemini Enterprise
Reported launch-day pricing is around:
- $1.50 per 1M input tokens
- $9.00 per 1M output tokens
That is higher than last year’s 3.1 Flash-Lite, but still cheaper than Pro-tier competitors. For batch mode, cached input, and Vertex rates, see the full pricing guide.
Use Flash when you need:
- High-throughput agent loops
- Vision-heavy chart or document understanding
- Low-latency calls from Apidog test scripts
- Streaming chat responses
- 1M-token document analysis without chunking
Gemini 3.5 Pro
Gemini 3.5 Pro is announced but not yet shipping. Google positions it as the flagship model for deeper agentic work, multi-hour autonomous tasks, and top leaderboard performance.
Until Pro becomes available, Flash is the practical default for developers who want to start building on the 3.5 family now.
Gemini Nano
Google did not ship a Gemini 3.5 Nano variant. On-device inference still uses the 3.1 Flash-Lite line. A 3.5 Nano announcement may arrive closer to the next Pixel cycle.
Where you can use Gemini 3.5 Flash
Six launch surfaces are available:
- Gemini app: global rollout for free and paid tiers
- AI Mode in Google Search: answers and follow-ups
- Google Antigravity: Google’s agent platform for end-user automation
- Gemini API: developer access through AI Studio
- Android Studio: IDE-level coding assistance for Android developers
- Gemini Enterprise + Agent Platform: managed agent runtime for organizations
The newest surface is Gemini Spark, a personal agent that runs continuously on your account. Spark uses Flash and connects to Gmail, Calendar, and Drive context.
Search also adds information agents: small autonomous helpers that track topics and pull updates without requiring you to re-query.
How to start using Gemini 3.5 Flash
You have four main paths.
1. Use the Gemini app for manual testing
Go to gemini.google.com, select 3.5 Flash from the model selector, and test prompts manually.
This is useful for:
- Prompt exploration
- Research
- Writing tasks
- Code sketching
- Image analysis
It is not enough for production validation because you cannot reliably test API payload shape, streaming behavior, or tool-call schemas from the chat UI alone.
2. Use Google AI Studio for free API access
Go to ai.google.dev, sign in, and create an API key. Flash is available on the free tier at roughly 1,500 requests per day at launch.
The basic implementation flow is:
export GEMINI_API_KEY="YOUR_API_KEY"
export GEMINI_MODEL="gemini-3.5-flash"
Then call the model from your SDK or REST client.
If you have used the Google Gemini API before, the pattern is the same:
- Create an API key
- Set
GEMINI_API_KEY - Use the model name
gemini-3.5-flash - Send your request
- Capture latency, token usage, and response shape
For setup details, see the free Gemini API key guide or the Flash-specific free guide.
3. Use the Gemini API in production
For production, use the same model name with a billed account:
gemini-3.5-flash
A minimal request shape looks like this:
{
"model": "gemini-3.5-flash",
"contents": [
{
"role": "user",
"parts": [
{
"text": "Summarize this API error log and suggest the next debugging step."
}
]
}
]
}
When moving from prototype to production, validate:
- Request payload shape
- Response schema
- Streaming chunks
- Tool-call arguments
- Error responses
- Token usage
- Latency under load
For complete Python, Node, curl, streaming, tool-use, and multimodal examples, see How to Use the Gemini 3.5 Flash API.
When you wire Flash into your stack, test the API like any other dependency. Apidog can help you inspect the full request/response cycle for Flash REST and streaming endpoints in one workspace, including tool calls and multimodal payloads.
4. Use Gemini Enterprise for managed deployment
For organizations, Gemini Enterprise Agent Platform packages Flash with enterprise controls such as audit logs, data residency, and managed runtime features.
A practical rollout path is:
- Prototype in AI Studio
- Validate API behavior in a test workspace
- Build an evaluation suite
- Run Flash against production-like workloads
- Move governed workloads to Gemini Enterprise if required
What Gemini 3.5 Flash is good at
Long agent loops at lower cost
Flash is designed for multi-step tasks with tool calls. The MCP Atlas score of 83.6% is the strongest signal here.
Use it for workflows like:
User request
-> plan task
-> call search/tool
-> inspect result
-> call second tool
-> update plan
-> produce final answer
When testing this class of workload, measure:
- Tool selection accuracy
- Tool-call order
- Recovery from failed tool calls
- Repetition or loop behavior
- Total cost per completed task
Chart and document reasoning
CharXiv Reasoning at 84.2% makes Flash a strong fit for reports, PDFs, charts, and diagrams.
Example workloads:
- Extract values from charts
- Summarize financial reports
- Compare tables across documents
- Explain diagrams
- Turn visual data into structured JSON
Interactive UI generation
Flash can generate HTML, SVG, diagrams, and interactive UI output directly. For developers, the useful workflow is:
- Ask for a component or dashboard
- Request HTML/CSS/JS output
- Run it locally
- Feed errors back into the model
- Repeat until usable
The graphics quality is a visible upgrade over 3.1 Flash-Lite.
Cost-sensitive production workloads
Google frames Flash as less than half the cost of other frontier models for agentic tasks. For long-running agents, per-task cost matters more than simple per-token cost.
Compare models using:
total_task_cost =
input_tokens_cost
+ output_tokens_cost
+ retry_cost
+ tool_call_cost
+ failed_task_cost
Flash may be attractive compared with Opus 4.7 or GPT-5.5 when the workload is agent-heavy. See the pricing breakdown before shifting traffic.
What Flash is still not great at
No model is a universal default. Watch these areas.
- Pure SWE-Bench Verified-style work: Opus 4.7’s 87.6% still leads on isolated bug-fix benchmarks. If your KPI is single-issue resolution, test carefully.
- Voice: Gemini’s voice stack is separate. For that workload, compare with Grok Voice vs GPT-Realtime.
- Tool ecosystem maturity: OpenAI and Anthropic have a head start in third-party adapters. Google is catching up with Antigravity, but the ecosystem is younger.
How to test Gemini 3.5 Flash properly
Before putting Flash into production, test two things:
- Response shape stability
- Tool-call correctness
A small evaluation harness is enough to start.
Step 1: Pin representative prompts
Create a fixed prompt set that reflects real production traffic.
Example categories:
- Short user questions
- Long-context document tasks
- Tool-calling tasks
- Multimodal tasks
- Coding tasks
- Refusal/safety edge cases
- Streaming responses
Step 2: Run Flash and your current model side by side
For every prompt, store:
{
"prompt_id": "tool_call_001",
"model": "gemini-3.5-flash",
"latency_ms": 0,
"input_tokens": 0,
"output_tokens": 0,
"status": "pass",
"notes": ""
}
Step 3: Score task success
Do not score only on “good-looking output.” Score on downstream success.
Useful dimensions:
- Did the model call the right tool?
- Were arguments valid?
- Did the response match your schema?
- Did the task complete?
- How many retries were needed?
- What was the total cost?
Step 4: Watch for schema drift
If you use function calling, validate tool arguments strictly.
Example checklist:
- Required fields present
- Field types correct
- Enum values valid
- No unexpected nested structures
- No missing tool call when one is required
For steps 1 and 3, Apidog can provide a recorded test suite for Flash API endpoints, including streaming. You can replay the same prompts across model versions and diff outputs. Download Apidog if you want to set this up locally.
Migration tips from Gemini 3.1 to 3.5 Flash
If you already use Gemini 3.1, migration is usually a one-line model string change:
- gemini-3.1-flash
+ gemini-3.5-flash
Before routing production traffic, check the following.
| Area | What to verify |
|---|---|
| Token budgets | 1M input / 64K output stays the same |
| Tool schemas | Existing function definitions should carry over |
| Streaming UI | Faster output may require frontend throttling |
| Pricing | Re-baseline using the Flash pricing guide |
| Safety behavior | Rerun red-team and refusal-pattern tests |
For deeper SDK notes, see the Google Gemini 3 API guide.
FAQ
When is Gemini 3.5 Pro available?
Google announced “rolling out next month” on May 19, 2026. Expect general availability in June 2026 across AI Studio, Gemini API, and Gemini Enterprise. Until then, Flash is the only Gemini 3.5 variant you can call.
Is Gemini 3.5 Flash free to use?
Yes, with daily quotas. The Gemini app standard tier and AI Studio API keys both provide Flash access without payment. See the Flash free guide and Get Free Unlimited Gemini API for available free paths.
Does Gemini 3.5 Flash support function calling?
Yes. Tool calling and subagent dispatch are first-class features. Google’s MCP Atlas score of 83.6% is the benchmark signal.
How does Flash compare to Opus 4.7 and GPT-5.5?
Flash leads on cost, output speed, and chart reasoning. Opus 4.7 still edges ahead on SWE-Bench Pro and long-form writing. GPT-5.5 wins on token efficiency. See the three-way comparison for details.
Can I run Gemini 3.5 Flash locally?
No. There is no open-weights release. For local inference, see the best local LLMs of 2026.
Does Gemini 3.5 Flash work with Cursor?
Yes, through the standard Gemini API. The setup pattern is the same as Gemini 3.0 Pro with Cursor.
What is the API model name for Flash?
Use:
gemini-3.5-flash
What this means for your stack
Use this decision path:
- Already on Gemini 3.1 Flash? Run a side-by-side test this week. Streaming UIs may benefit immediately from faster output.
- Already on Opus 4.7 or GPT-5.5? Run a cost-and-quality evaluation. Agent-heavy workloads may justify routing some traffic to Flash.
- Building a new agent loop? Start with Flash if cost and speed matter.
- Handling multimodal documents or charts? Test Flash early. CharXiv Reasoning at 84.2% is meaningful.
Treat Gemini 3.5 Flash as one component in a larger production pipeline. You still need prompt design, tool wiring, retries, eval scripts, and endpoint testing. Apidog covers the Gemini API testing side; the rest of the loop is your application logic.



Top comments (0)