The Gemini 3.5 Flash API went live with the model launch on May 19, 2026. Flash is currently the only Gemini 3.5 variant available; Pro lands in June. This guide shows how to get an API key, make your first request, send multimodal input, stream responses, use tools, return structured JSON, and test the integration with Apidog.
If you’ve used the Gemini API before, the request pattern is the same. The main change is the model name:
gemini-3.5-flash
If you’re new to Gemini, you can make a working Flash request in about ten minutes.
What you get with the Gemini 3.5 Flash API
Start with the three practical details:
-
Model name:
gemini-3.5-flash - Endpoint pattern: same Gemini API style used by Gemini 3 and Gemini 3.1
- Free tier: available through Google AI Studio, with about 1,500 requests per day and no credit card
Capabilities exposed through the Flash API include:
- 1M token input context
- 64K output tokens
- Text and image input
- Text and structured output
- Native function calling and tool use
- Streaming responses
- Chart and document reasoning
For pricing details, including per-token rates and batch mode discounts, see the Gemini 3.5 Flash pricing guide.
Step 1: Get your Gemini 3.5 Flash API key
You have two common setup paths:
- Google AI Studio for free-tier development
- Vertex AI for production workloads
Path A: Google AI Studio free tier
- Go to aistudio.google.com
- Sign in with a Google account
- Click Get API key in the left navigation
- Select an existing project or create a new one
- Click Create API key
- Copy the key and store it as an environment variable
Example:
export GEMINI_API_KEY="your-api-key"
This flow is also covered in the free Gemini API key guide. The key works with gemini-3.5-flash immediately within the free daily quota.
Path B: Vertex AI production setup
Use Vertex AI when you need billing, IAM, audit logs, and organization-level controls.
- Open Google Cloud Console
- Enable the Vertex AI API
- Create a service account
- Grant it the
aiplatform.userrole - Download the JSON credentials
- Authenticate with either:
gcloud auth application-default login
Or point your application to the JSON credentials.
Most teams start with AI Studio, then migrate to Vertex AI when they need production controls.
Step 2: Install the SDK
The official Google GenAI SDK is available for Python, Node.js, Go, and Java.
Python
pip install -U google-genai
Node.js
npm install @google/genai
Go
go get google.golang.org/genai
You can also skip the SDK and call the REST endpoint directly with curl.
Step 3: Make your first Flash call
Python
import os
from google import genai
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs."
)
print(response.text)
Node.js
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({
apiKey: process.env.GEMINI_API_KEY,
});
const response = await ai.models.generateContent({
model: "gemini-3.5-flash",
contents: "Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs.",
});
console.log(response.text);
REST with curl
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"parts": [
{
"text": "Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs."
}
]
}
]
}'
That is the minimum working request. From here, add streaming, multimodal input, tools, and structured output as needed.
Stream responses
Use streaming when you want users to see output as it is generated.
Python
stream = client.models.generate_content_stream(
model="gemini-3.5-flash",
contents="Write a 5-step tutorial on writing a REST API client in Go."
)
for chunk in stream:
print(chunk.text, end="", flush=True)
Node.js
const stream = await ai.models.generateContentStream({
model: "gemini-3.5-flash",
contents: "Write a 5-step tutorial on writing a REST API client in Go.",
});
for await (const chunk of stream) {
process.stdout.write(chunk.text);
}
For raw REST calls, change the endpoint suffix from:
:generateContent
To:
:streamGenerateContent
Send multimodal input
Gemini 3.5 Flash accepts images alongside text. This is useful for dashboards, screenshots, diagrams, scanned documents, and chart extraction.
Python: image from disk
import os
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
with open("dashboard.png", "rb") as f:
image_bytes = f.read()
response = client.models.generate_content(
model="gemini-3.5-flash",
contents=[
types.Part.from_bytes(
data=image_bytes,
mime_type="image/png"
),
"Extract every metric in this dashboard as a JSON object."
]
)
print(response.text)
Supported image MIME types include:
image/pngimage/jpegimage/webpimage/heicimage/heif
PDFs and video can also be passed through types.Part.from_uri().
Use function calling and tools
Tool calling lets the model request a function call instead of answering directly. Your application executes the function, sends the result back, and continues the conversation.
Python
from google.genai import types
weather_tool = types.Tool(
function_declarations=[
{
"name": "get_current_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["city"]
}
}
]
)
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="What's the weather in Singapore right now?",
config=types.GenerateContentConfig(
tools=[weather_tool]
)
)
for part in response.candidates[0].content.parts:
if part.function_call:
print(f"Call: {part.function_call.name}")
print(f"Args: {dict(part.function_call.args)}")
Flash returns a function_call object with the function name and arguments. The same tool-calling pattern applies to the Gemini 3 Flash API.
Return structured JSON
Use JSON mode when downstream code needs predictable output.
import json
from google.genai import types
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="List 3 popular API testing tools with their pricing.",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema={
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price_per_month": {"type": "number"},
"free_tier": {"type": "boolean"}
},
"required": ["name", "free_tier"]
}
}
)
)
data = json.loads(response.text)
print(data)
This avoids regex parsing and reduces retry logic caused by malformed free-form output.
Pricing as of May 2026
Pay-as-you-go rates for gemini-3.5-flash:
| Tier | Input | Output |
|---|---|---|
| Standard | ~$1.50 / 1M tokens | ~$9.00 / 1M tokens |
| Cached input | reduced rate | n/a |
| Batch mode | ~50% off | ~50% off |
For batch workloads, Gemini API batch mode provides the 50% discount for jobs that do not require real-time latency.
For more pricing examples, see the Flash pricing breakdown. For the official Google reference, see Gemini Developer API pricing.
Test your Gemini 3.5 Flash integration with Apidog
A working SDK call is only the first step. Production integrations also need to handle:
- Streaming chunks
- Tool-call validation
- Multimodal payloads
- Error retries
- Rate limits
- JSON schema drift
Apidog lets you test the Gemini Flash API surface in one workspace.
Practical setup:
- Create a new request in Apidog
- Paste the Gemini Flash endpoint:
https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent
- Add the
x-goog-api-keyheader - Set
Content-Typetoapplication/json - Paste the request body from the
curlexample - Send the request
- Save it for reuse
You can also use Apidog to:
- Replay the same request across model versions
- Swap
gemini-3.5-flashforgemini-3-flashand compare outputs - Render streamed chunks inline
- Validate JSON schema output with assertions
- Mock the Flash endpoint for downstream tests
- Build test scenarios for agent loops
To get started, download Apidog, create a new request pointing at the Flash endpoint, and import the earlier curl snippet.
Handle errors and rate limits
Common HTTP status codes:
| Status | Meaning | Recommended action |
|---|---|---|
400 |
Bad request | Check the contents array, MIME type, and JSON body |
401 |
Invalid API key | Verify GEMINI_API_KEY
|
403 |
Quota exhausted or model unavailable | Check quota and model access |
429 |
Rate limited | Retry with backoff |
500 / 503
|
Server-side issue | Retry with exponential backoff |
Python retry wrapper
import time
from google import genai
def call_with_retry(client, model, prompt, max_retries=3):
for attempt in range(max_retries):
try:
return client.models.generate_content(
model=model,
contents=prompt
)
except Exception:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
Free tier quotas reset daily. Production tier quotas reset per minute and per day.
For high-throughput jobs, consider batch mode or a fallback to Gemini 3 Flash when you hit limits.
Migrate from Gemini 3.1 to 3.5 Flash
For most projects, migration is a one-line model name change.
# Before
model = "gemini-3.1-pro" # or "gemini-3.1-flash"
# After
model = "gemini-3.5-flash"
After switching, verify:
- Tool schemas: rerun your tool-call tests
- Streaming UI: faster chunks may need throttling
- Token budgets: confirm actual input and output usage
- Refusal behavior: safety behavior may differ on edge cases
- Structured output: rerun schema validation tests
For a deeper SDK walkthrough, see the Gemini 3.1 Pro API guide.
Common implementation patterns
Long-context document analysis
from google.genai import types
with open("large_report.pdf", "rb") as f:
pdf_bytes = f.read()
response = client.models.generate_content(
model="gemini-3.5-flash",
contents=[
types.Part.from_bytes(
data=pdf_bytes,
mime_type="application/pdf"
),
"Summarize the financial outlook from this report in 5 bullet points."
]
)
print(response.text)
Flash’s 1M token context can handle full PDFs without manual chunking.
Agent loop with tool calls
conversation = [
{
"role": "user",
"parts": [
{
"text": "Book me a flight to Tokyo"
}
]
}
]
while True:
response = client.models.generate_content(
model="gemini-3.5-flash",
contents=conversation,
config=types.GenerateContentConfig(
tools=[flight_search_tool, booking_tool]
)
)
part = response.candidates[0].content.parts[0]
if not part.function_call:
print(part.text)
break
result = execute_tool(part.function_call)
conversation.append({
"role": "model",
"parts": [part]
})
conversation.append({
"role": "user",
"parts": [
{
"function_response": result
}
]
})
Use this pattern when Flash needs to call APIs, validate tool responses, and continue until it has enough information to produce a final answer.
FAQ
Is there a free tier for the Gemini 3.5 Flash API?
Yes. Google AI Studio provides daily quotas of about 1,500 requests per day with no credit card required. See the Google AI Studio API key guide.
Does Flash support OpenAI-compatible endpoints?
Yes. Google exposes an OpenAI-compatible shim at:
/v1beta/openai/
You can point an OpenAI SDK at it by setting base_url and using your Gemini API key. The model name remains:
gemini-3.5-flash
Can I use Flash with LangChain or LlamaIndex?
Yes. Both have native Gemini integrations. Pass the model name in the relevant wrapper:
model = "gemini-3.5-flash"
When does Gemini 3.5 Pro ship?
June 2026, according to Google’s launch announcement. Until then, Flash is the only Gemini 3.5 variant available.
What is the max image size for Flash?
The recommended image size is 3072×3072. Larger images are resampled.
For OCR-heavy workflows, see the Gemini 2.0 Flash OCR workflow. The same general patterns apply.
How do I test streaming endpoints in Apidog?
Use the streaming endpoint suffix:
:streamGenerateContent
Apidog renders the SSE chunks as they arrive, which helps debug partial or incomplete responses.
Where can I see API logs?
For AI Studio, check Activity.
For Vertex AI production deployments, use Logs Explorer in Google Cloud.
What to build first
Good starter projects for Gemini 3.5 Flash:
- PDF Q&A bot: load a PDF into the 1M context window and answer questions
- Chart-to-JSON pipeline: extract structured data from dashboard screenshots
- Customer support agent: use function calling against your CRM
- Code review assistant: analyze multi-file diffs and return severity-rated findings
- Internal search agent: combine long context with calls to internal APIs
For each project, use the same workflow:
- Build the prompt
- Wrap it in an SDK or REST call
- Validate the response shape with Apidog
- Add retries and rate-limit handling
- Ship the integration



Top comments (0)