Build with Gemini 3 Flash, frontier intelligence that scales with you

#gemini #ai #antigravity #cli

Today we’re introducing Gemini 3 Flash, our latest model with frontier intelligence built for speed at a fraction of the cost. Building on 3 Pro’s strong multimodal, coding and agentic features, 3 Flash offers powerful performance at less than a quarter the cost of 3 Pro, along with higher rate limits. The new 3 Flash model surpasses 2.5 Pro across many benchmarks while delivering faster speeds. It also features our most advanced visual and spatial reasoning and now offers code execution to zoom, count and edit visual inputs.

Flash remains our most popular version, with 2 and 2.5 Flash processing trillions of tokens across hundreds of thousands of apps built by millions of developers. Our Flash models are truly built for developers, and with 3 Flash, you no longer need to compromise between speed and intelligence.

Gemini 3 Flash is rolling out to developers in the Gemini API via Google AI Studio, Google Antigravity, Gemini CLI, Android Studio and to enterprises via Vertex AI.

Smarter, faster and ready for production at scale

Gemini 3 Flash delivers frontier-class performance on PhD-level reasoning and knowledge benchmarks like GPQA Diamond (90.4%) and Humanity’s Last Exam (33.7% without tools), rivaling much larger frontier models.

Gemini 3 Flash delivers frontier intelligence across top benchmarks.

Gemini 3 Flash is highly efficient without sacrificing intelligence, pushing the Pareto Frontier of performance and efficiency. It outperforms 2.5 Pro while being 3x faster (based on Artificial Analysis benchmarking) at a fraction of the cost. Even with the lowest thinking level, 3 Flash often outperforms previous versions with “high” thinking levels.

Gemini 3 Flash pushes the Pareto frontier on performance vs. cost and speed.

In the Gemini API and Vertex AI, Gemini 3 Flash is priced at $0.50/1M input tokens and $3/1M output tokens (audio input remains at $1/1M input tokens). It comes standard with context caching, allowing for 90% cost reductions in cases with repeated token use over certain thresholds. Similarly, 3 Flash is also available today with the Batch API, allowing for 50% cost savings and much higher rate limits for asynchronous processing. For synchronous and near real-time use cases, Paid API customers also have access to production-ready rate limits.

Gemini 3 Flash in action

Gemini 3 Flash is now integrated into many of our products and early customers are enthusiastic about it. With each new Flash model, it’s exciting to see what new use cases are created.

For coding

Gemini 3 Flash has even better coding and agent capabilities over previous versions, enabling rapid, iterative development, outperforming 3 Pro's agentic coding skill (78% on SWE-bench Verified) but operating faster for quick iterations. Today, 3 Flash is rolling out to users in Google Antigravity, our new agentic development platform, to provide intelligent coding assistance that keeps pace with your train of thought.

For gaming

Gemini 3 Flash introduces powerful performance for game developers, offering superior video analysis and near real-time reasoning that outperforms previous 2.5 versions.

Astrocade is using 3 Flash for its agentic game creation engine, generating full game plans and executable code from a single prompt, quickly turning concepts into playable games. (Sequences shortened.)

Beyond development, Gemini 3 Flash transforms the player experience. It allows Latitude's game creation engine to generate smarter characters and more realistic worlds, directly elevating gameplay.

Gemini 3 Flash has allowed Latitude to deliver high-quality outputs at low costs for many complex tasks in our next generation AI game engine that was previously only possible from pro level models like Sonnet 4.5.

Nick Walton
CEO, Latitude

For deepfake detection

Resemble AI is using Gemini 3 Flash to provide near real-time deepfake intelligence by instantly transforming complex forensic data into simple explanations. They discovered that 3 Flash offered 4x faster multimodal analysis compared to 2.5 Pro, processing raw technical outputs without hindering crucial workflows. You can learn more about this in their case study.

Resemble AI analyzes a viral deepfake, offering near real-time analysis on why the content is fabricated. (Sequences shortened.)

For document analysis

Performance gains often come with a latency tradeoff, but Gemini 3 Flash proves that fast models can still handle the rigorous accuracy demands of the legal industry. With strong reasoning capabilities without sacrificing speed, it enables new levels of efficiency for complex document analysis for Harvey, an AI company for law firms and professional service providers.

Gemini 3 Flash has achieved a meaningful step up in reasoning, improving over 7% on Harvey's BigLaw Bench from its predecessor, Gemini 2.5 Flash. These quality improvements, combined with Flash's low latency, are impactful for high-volume legal tasks such as extracting defined terms and cross-references from contracts.

Niko Grupen
Head of Applied Research, Harvey

Get started with Gemini 3 Flash

Gemini 3 Flash is available across many of our products, APIs and throughout the ecosystem. As you explore the Gemini 3 family, you have the option to use our new built-in API logs visualization dashboard and send us model feedback directly through Google AI Studio. Additionally, since 3 Flash is a reasoning model, you will also need to make sure to circulate thoughts in the API or use the new Interactions API.

Here’s where you can access Gemini 3 Flash now:

We are excited to put this model in your hands and see what you create with Gemini 3 Flash.