Hassann

Posted on May 20 • Originally published at apidog.com

How to Use the Gemini 3.5 Flash API?

The Gemini 3.5 Flash API went live with the model launch on May 19, 2026. Flash is currently the only Gemini 3.5 variant available; Pro lands in June. This guide shows how to get an API key, make your first request, send multimodal input, stream responses, use tools, return structured JSON, and test the integration with Apidog.

Try Apidog today

If you’ve used the Gemini API before, the request pattern is the same. The main change is the model name:

gemini-3.5-flash

If you’re new to Gemini, you can make a working Flash request in about ten minutes.

What you get with the Gemini 3.5 Flash API

Start with the three practical details:

Model name: gemini-3.5-flash
Endpoint pattern: same Gemini API style used by Gemini 3 and Gemini 3.1
Free tier: available through Google AI Studio, with about 1,500 requests per day and no credit card

Capabilities exposed through the Flash API include:

1M token input context
64K output tokens
Text and image input
Text and structured output
Native function calling and tool use
Streaming responses
Chart and document reasoning

For pricing details, including per-token rates and batch mode discounts, see the Gemini 3.5 Flash pricing guide.

Step 1: Get your Gemini 3.5 Flash API key

You have two common setup paths:

Google AI Studio for free-tier development
Vertex AI for production workloads

Path A: Google AI Studio free tier

Go to aistudio.google.com
Sign in with a Google account
Click Get API key in the left navigation
Select an existing project or create a new one
Click Create API key
Copy the key and store it as an environment variable

Example:

export GEMINI_API_KEY="your-api-key"

This flow is also covered in the free Gemini API key guide. The key works with gemini-3.5-flash immediately within the free daily quota.

Path B: Vertex AI production setup

Use Vertex AI when you need billing, IAM, audit logs, and organization-level controls.

Open Google Cloud Console
Enable the Vertex AI API
Create a service account
Grant it the aiplatform.user role
Download the JSON credentials
Authenticate with either:

gcloud auth application-default login

Or point your application to the JSON credentials.

Most teams start with AI Studio, then migrate to Vertex AI when they need production controls.

Step 2: Install the SDK

The official Google GenAI SDK is available for Python, Node.js, Go, and Java.

Python

pip install -U google-genai

Node.js

npm install @google/genai

Go

go get google.golang.org/genai

You can also skip the SDK and call the REST endpoint directly with curl.

Step 3: Make your first Flash call

Python

import os
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs."
)

print(response.text)

Node.js

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({
  apiKey: process.env.GEMINI_API_KEY,
});

const response = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: "Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs.",
});

console.log(response.text);

REST with curl

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs."
          }
        ]
      }
    ]
  }'

That is the minimum working request. From here, add streaming, multimodal input, tools, and structured output as needed.

Stream responses

Use streaming when you want users to see output as it is generated.

Python

stream = client.models.generate_content_stream(
    model="gemini-3.5-flash",
    contents="Write a 5-step tutorial on writing a REST API client in Go."
)

for chunk in stream:
    print(chunk.text, end="", flush=True)

Node.js

const stream = await ai.models.generateContentStream({
  model: "gemini-3.5-flash",
  contents: "Write a 5-step tutorial on writing a REST API client in Go.",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}

For raw REST calls, change the endpoint suffix from:

:generateContent

To:

:streamGenerateContent

Send multimodal input

Gemini 3.5 Flash accepts images alongside text. This is useful for dashboards, screenshots, diagrams, scanned documents, and chart extraction.

Python: image from disk

import os
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

with open("dashboard.png", "rb") as f:
    image_bytes = f.read()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=[
        types.Part.from_bytes(
            data=image_bytes,
            mime_type="image/png"
        ),
        "Extract every metric in this dashboard as a JSON object."
    ]
)

print(response.text)

Supported image MIME types include:

image/png
image/jpeg
image/webp
image/heic
image/heif

PDFs and video can also be passed through types.Part.from_uri().

Use function calling and tools

Tool calling lets the model request a function call instead of answering directly. Your application executes the function, sends the result back, and continues the conversation.

Python

from google.genai import types

weather_tool = types.Tool(
    function_declarations=[
        {
            "name": "get_current_weather",
            "description": "Get the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["city"]
            }
        }
    ]
)

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What's the weather in Singapore right now?",
    config=types.GenerateContentConfig(
        tools=[weather_tool]
    )
)

for part in response.candidates[0].content.parts:
    if part.function_call:
        print(f"Call: {part.function_call.name}")
        print(f"Args: {dict(part.function_call.args)}")

Flash returns a function_call object with the function name and arguments. The same tool-calling pattern applies to the Gemini 3 Flash API.

Return structured JSON

Use JSON mode when downstream code needs predictable output.

import json
from google.genai import types

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="List 3 popular API testing tools with their pricing.",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema={
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "price_per_month": {"type": "number"},
                    "free_tier": {"type": "boolean"}
                },
                "required": ["name", "free_tier"]
            }
        }
    )
)

data = json.loads(response.text)
print(data)

This avoids regex parsing and reduces retry logic caused by malformed free-form output.

Pricing as of May 2026

Pay-as-you-go rates for gemini-3.5-flash:

Tier	Input	Output
Standard	~$1.50 / 1M tokens	~$9.00 / 1M tokens
Cached input	reduced rate	n/a
Batch mode	~50% off	~50% off

For batch workloads, Gemini API batch mode provides the 50% discount for jobs that do not require real-time latency.

For more pricing examples, see the Flash pricing breakdown. For the official Google reference, see Gemini Developer API pricing.

Test your Gemini 3.5 Flash integration with Apidog

A working SDK call is only the first step. Production integrations also need to handle:

Streaming chunks
Tool-call validation
Multimodal payloads
Error retries
Rate limits
JSON schema drift

Apidog lets you test the Gemini Flash API surface in one workspace.

Practical setup:

Create a new request in Apidog
Paste the Gemini Flash endpoint:

https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent

Add the x-goog-api-key header
Set Content-Type to application/json
Paste the request body from the curl example
Send the request
Save it for reuse

You can also use Apidog to:

Replay the same request across model versions
Swap gemini-3.5-flash for gemini-3-flash and compare outputs
Render streamed chunks inline
Validate JSON schema output with assertions
Mock the Flash endpoint for downstream tests
Build test scenarios for agent loops

To get started, download Apidog, create a new request pointing at the Flash endpoint, and import the earlier curl snippet.

Handle errors and rate limits

Common HTTP status codes:

Status	Meaning	Recommended action
`400`	Bad request	Check the `contents` array, MIME type, and JSON body
`401`	Invalid API key	Verify `GEMINI_API_KEY`
`403`	Quota exhausted or model unavailable	Check quota and model access
`429`	Rate limited	Retry with backoff
`500` / `503`	Server-side issue	Retry with exponential backoff

Python retry wrapper

import time
from google import genai

def call_with_retry(client, model, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.models.generate_content(
                model=model,
                contents=prompt
            )
        except Exception:
            if attempt == max_retries - 1:
                raise

            time.sleep(2 ** attempt)

Free tier quotas reset daily. Production tier quotas reset per minute and per day.

For high-throughput jobs, consider batch mode or a fallback to Gemini 3 Flash when you hit limits.

Migrate from Gemini 3.1 to 3.5 Flash

For most projects, migration is a one-line model name change.

# Before
model = "gemini-3.1-pro"  # or "gemini-3.1-flash"

# After
model = "gemini-3.5-flash"

After switching, verify:

Tool schemas: rerun your tool-call tests
Streaming UI: faster chunks may need throttling
Token budgets: confirm actual input and output usage
Refusal behavior: safety behavior may differ on edge cases
Structured output: rerun schema validation tests

For a deeper SDK walkthrough, see the Gemini 3.1 Pro API guide.

Common implementation patterns

Long-context document analysis

from google.genai import types

with open("large_report.pdf", "rb") as f:
    pdf_bytes = f.read()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=[
        types.Part.from_bytes(
            data=pdf_bytes,
            mime_type="application/pdf"
        ),
        "Summarize the financial outlook from this report in 5 bullet points."
    ]
)

print(response.text)

Flash’s 1M token context can handle full PDFs without manual chunking.

Agent loop with tool calls

conversation = [
    {
        "role": "user",
        "parts": [
            {
                "text": "Book me a flight to Tokyo"
            }
        ]
    }
]

while True:
    response = client.models.generate_content(
        model="gemini-3.5-flash",
        contents=conversation,
        config=types.GenerateContentConfig(
            tools=[flight_search_tool, booking_tool]
        )
    )

    part = response.candidates[0].content.parts[0]

    if not part.function_call:
        print(part.text)
        break

    result = execute_tool(part.function_call)

    conversation.append({
        "role": "model",
        "parts": [part]
    })

    conversation.append({
        "role": "user",
        "parts": [
            {
                "function_response": result
            }
        ]
    })

Use this pattern when Flash needs to call APIs, validate tool responses, and continue until it has enough information to produce a final answer.

FAQ

Is there a free tier for the Gemini 3.5 Flash API?

Yes. Google AI Studio provides daily quotas of about 1,500 requests per day with no credit card required. See the Google AI Studio API key guide.

Does Flash support OpenAI-compatible endpoints?

Yes. Google exposes an OpenAI-compatible shim at:

/v1beta/openai/

You can point an OpenAI SDK at it by setting base_url and using your Gemini API key. The model name remains:

gemini-3.5-flash

Can I use Flash with LangChain or LlamaIndex?

Yes. Both have native Gemini integrations. Pass the model name in the relevant wrapper:

model = "gemini-3.5-flash"

When does Gemini 3.5 Pro ship?

June 2026, according to Google’s launch announcement. Until then, Flash is the only Gemini 3.5 variant available.

What is the max image size for Flash?

The recommended image size is 3072×3072. Larger images are resampled.

For OCR-heavy workflows, see the Gemini 2.0 Flash OCR workflow. The same general patterns apply.

How do I test streaming endpoints in Apidog?

Use the streaming endpoint suffix:

:streamGenerateContent

Apidog renders the SSE chunks as they arrive, which helps debug partial or incomplete responses.

Where can I see API logs?

For AI Studio, check Activity.

For Vertex AI production deployments, use Logs Explorer in Google Cloud.

What to build first

Good starter projects for Gemini 3.5 Flash:

PDF Q&A bot: load a PDF into the 1M context window and answer questions
Chart-to-JSON pipeline: extract structured data from dashboard screenshots
Customer support agent: use function calling against your CRM
Code review assistant: analyze multi-file diffs and return severity-rated findings
Internal search agent: combine long context with calls to internal APIs

For each project, use the same workflow:

Build the prompt
Wrap it in an SDK or REST call
Validate the response shape with Apidog
Add retries and rate-limit handling
Ship the integration

DEV Community

How to Use the Gemini 3.5 Flash API?

What you get with the Gemini 3.5 Flash API

Step 1: Get your Gemini 3.5 Flash API key

Path A: Google AI Studio free tier

Path B: Vertex AI production setup

Step 2: Install the SDK

Python

Node.js

Go

Step 3: Make your first Flash call

Python

Node.js

REST with curl

Stream responses

Python

Node.js

Send multimodal input

Python: image from disk

Use function calling and tools

Python

Return structured JSON

Pricing as of May 2026

Test your Gemini 3.5 Flash integration with Apidog

Handle errors and rate limits

Python retry wrapper

Migrate from Gemini 3.1 to 3.5 Flash

Common implementation patterns

Long-context document analysis

Agent loop with tool calls

FAQ

Is there a free tier for the Gemini 3.5 Flash API?

Does Flash support OpenAI-compatible endpoints?

Can I use Flash with LangChain or LlamaIndex?

When does Gemini 3.5 Pro ship?

What is the max image size for Flash?

How do I test streaming endpoints in Apidog?

Where can I see API logs?

What to build first

Top comments (0)