DEV Community

Cover image for How to Use the Gemini 3.5 Flash API?
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Use the Gemini 3.5 Flash API?

The Gemini 3.5 Flash API went live with the model launch on May 19, 2026. Flash is currently the only Gemini 3.5 variant available; Pro lands in June. This guide shows how to get an API key, make your first request, send multimodal input, stream responses, use tools, return structured JSON, and test the integration with Apidog.

Try Apidog today

If you’ve used the Gemini API before, the request pattern is the same. The main change is the model name:

gemini-3.5-flash
Enter fullscreen mode Exit fullscreen mode

If you’re new to Gemini, you can make a working Flash request in about ten minutes.

Gemini 3.5 Flash API

What you get with the Gemini 3.5 Flash API

Start with the three practical details:

  • Model name: gemini-3.5-flash
  • Endpoint pattern: same Gemini API style used by Gemini 3 and Gemini 3.1
  • Free tier: available through Google AI Studio, with about 1,500 requests per day and no credit card

Capabilities exposed through the Flash API include:

  • 1M token input context
  • 64K output tokens
  • Text and image input
  • Text and structured output
  • Native function calling and tool use
  • Streaming responses
  • Chart and document reasoning

For pricing details, including per-token rates and batch mode discounts, see the Gemini 3.5 Flash pricing guide.

Step 1: Get your Gemini 3.5 Flash API key

You have two common setup paths:

  • Google AI Studio for free-tier development
  • Vertex AI for production workloads

Path A: Google AI Studio free tier

  1. Go to aistudio.google.com
  2. Sign in with a Google account
  3. Click Get API key in the left navigation
  4. Select an existing project or create a new one
  5. Click Create API key
  6. Copy the key and store it as an environment variable

Example:

export GEMINI_API_KEY="your-api-key"
Enter fullscreen mode Exit fullscreen mode

This flow is also covered in the free Gemini API key guide. The key works with gemini-3.5-flash immediately within the free daily quota.

Google AI Studio API key

Path B: Vertex AI production setup

Use Vertex AI when you need billing, IAM, audit logs, and organization-level controls.

  1. Open Google Cloud Console
  2. Enable the Vertex AI API
  3. Create a service account
  4. Grant it the aiplatform.user role
  5. Download the JSON credentials
  6. Authenticate with either:
gcloud auth application-default login
Enter fullscreen mode Exit fullscreen mode

Or point your application to the JSON credentials.

Most teams start with AI Studio, then migrate to Vertex AI when they need production controls.

Step 2: Install the SDK

The official Google GenAI SDK is available for Python, Node.js, Go, and Java.

Python

pip install -U google-genai
Enter fullscreen mode Exit fullscreen mode

Node.js

npm install @google/genai
Enter fullscreen mode Exit fullscreen mode

Go

go get google.golang.org/genai
Enter fullscreen mode Exit fullscreen mode

You can also skip the SDK and call the REST endpoint directly with curl.

Step 3: Make your first Flash call

Python

import os
from google import genai

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs."
)

print(response.text)
Enter fullscreen mode Exit fullscreen mode

Node.js

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({
  apiKey: process.env.GEMINI_API_KEY,
});

const response = await ai.models.generateContent({
  model: "gemini-3.5-flash",
  contents: "Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs.",
});

console.log(response.text);
Enter fullscreen mode Exit fullscreen mode

REST with curl

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "Explain how OAuth 2.0 PKCE flow works in 3 short paragraphs."
          }
        ]
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

That is the minimum working request. From here, add streaming, multimodal input, tools, and structured output as needed.

Stream responses

Use streaming when you want users to see output as it is generated.

Python

stream = client.models.generate_content_stream(
    model="gemini-3.5-flash",
    contents="Write a 5-step tutorial on writing a REST API client in Go."
)

for chunk in stream:
    print(chunk.text, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Node.js

const stream = await ai.models.generateContentStream({
  model: "gemini-3.5-flash",
  contents: "Write a 5-step tutorial on writing a REST API client in Go.",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text);
}
Enter fullscreen mode Exit fullscreen mode

For raw REST calls, change the endpoint suffix from:

:generateContent
Enter fullscreen mode Exit fullscreen mode

To:

:streamGenerateContent
Enter fullscreen mode Exit fullscreen mode

Send multimodal input

Gemini 3.5 Flash accepts images alongside text. This is useful for dashboards, screenshots, diagrams, scanned documents, and chart extraction.

Python: image from disk

import os
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

with open("dashboard.png", "rb") as f:
    image_bytes = f.read()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=[
        types.Part.from_bytes(
            data=image_bytes,
            mime_type="image/png"
        ),
        "Extract every metric in this dashboard as a JSON object."
    ]
)

print(response.text)
Enter fullscreen mode Exit fullscreen mode

Supported image MIME types include:

  • image/png
  • image/jpeg
  • image/webp
  • image/heic
  • image/heif

PDFs and video can also be passed through types.Part.from_uri().

Use function calling and tools

Tool calling lets the model request a function call instead of answering directly. Your application executes the function, sends the result back, and continues the conversation.

Python

from google.genai import types

weather_tool = types.Tool(
    function_declarations=[
        {
            "name": "get_current_weather",
            "description": "Get the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["city"]
            }
        }
    ]
)

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What's the weather in Singapore right now?",
    config=types.GenerateContentConfig(
        tools=[weather_tool]
    )
)

for part in response.candidates[0].content.parts:
    if part.function_call:
        print(f"Call: {part.function_call.name}")
        print(f"Args: {dict(part.function_call.args)}")
Enter fullscreen mode Exit fullscreen mode

Flash returns a function_call object with the function name and arguments. The same tool-calling pattern applies to the Gemini 3 Flash API.

Return structured JSON

Use JSON mode when downstream code needs predictable output.

import json
from google.genai import types

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="List 3 popular API testing tools with their pricing.",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema={
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "price_per_month": {"type": "number"},
                    "free_tier": {"type": "boolean"}
                },
                "required": ["name", "free_tier"]
            }
        }
    )
)

data = json.loads(response.text)
print(data)
Enter fullscreen mode Exit fullscreen mode

This avoids regex parsing and reduces retry logic caused by malformed free-form output.

Pricing as of May 2026

Pay-as-you-go rates for gemini-3.5-flash:

Tier Input Output
Standard ~$1.50 / 1M tokens ~$9.00 / 1M tokens
Cached input reduced rate n/a
Batch mode ~50% off ~50% off

For batch workloads, Gemini API batch mode provides the 50% discount for jobs that do not require real-time latency.

For more pricing examples, see the Flash pricing breakdown. For the official Google reference, see Gemini Developer API pricing.

Test your Gemini 3.5 Flash integration with Apidog

A working SDK call is only the first step. Production integrations also need to handle:

  • Streaming chunks
  • Tool-call validation
  • Multimodal payloads
  • Error retries
  • Rate limits
  • JSON schema drift

Testing Gemini 3.5 Flash with Apidog

Apidog lets you test the Gemini Flash API surface in one workspace.

Practical setup:

  1. Create a new request in Apidog
  2. Paste the Gemini Flash endpoint:
https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent
Enter fullscreen mode Exit fullscreen mode
  1. Add the x-goog-api-key header
  2. Set Content-Type to application/json
  3. Paste the request body from the curl example
  4. Send the request
  5. Save it for reuse

You can also use Apidog to:

  • Replay the same request across model versions
  • Swap gemini-3.5-flash for gemini-3-flash and compare outputs
  • Render streamed chunks inline
  • Validate JSON schema output with assertions
  • Mock the Flash endpoint for downstream tests
  • Build test scenarios for agent loops

To get started, download Apidog, create a new request pointing at the Flash endpoint, and import the earlier curl snippet.

Handle errors and rate limits

Common HTTP status codes:

Status Meaning Recommended action
400 Bad request Check the contents array, MIME type, and JSON body
401 Invalid API key Verify GEMINI_API_KEY
403 Quota exhausted or model unavailable Check quota and model access
429 Rate limited Retry with backoff
500 / 503 Server-side issue Retry with exponential backoff

Python retry wrapper

import time
from google import genai

def call_with_retry(client, model, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.models.generate_content(
                model=model,
                contents=prompt
            )
        except Exception:
            if attempt == max_retries - 1:
                raise

            time.sleep(2 ** attempt)
Enter fullscreen mode Exit fullscreen mode

Free tier quotas reset daily. Production tier quotas reset per minute and per day.

For high-throughput jobs, consider batch mode or a fallback to Gemini 3 Flash when you hit limits.

Migrate from Gemini 3.1 to 3.5 Flash

For most projects, migration is a one-line model name change.

# Before
model = "gemini-3.1-pro"  # or "gemini-3.1-flash"

# After
model = "gemini-3.5-flash"
Enter fullscreen mode Exit fullscreen mode

After switching, verify:

  1. Tool schemas: rerun your tool-call tests
  2. Streaming UI: faster chunks may need throttling
  3. Token budgets: confirm actual input and output usage
  4. Refusal behavior: safety behavior may differ on edge cases
  5. Structured output: rerun schema validation tests

For a deeper SDK walkthrough, see the Gemini 3.1 Pro API guide.

Common implementation patterns

Long-context document analysis

from google.genai import types

with open("large_report.pdf", "rb") as f:
    pdf_bytes = f.read()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents=[
        types.Part.from_bytes(
            data=pdf_bytes,
            mime_type="application/pdf"
        ),
        "Summarize the financial outlook from this report in 5 bullet points."
    ]
)

print(response.text)
Enter fullscreen mode Exit fullscreen mode

Flash’s 1M token context can handle full PDFs without manual chunking.

Agent loop with tool calls

conversation = [
    {
        "role": "user",
        "parts": [
            {
                "text": "Book me a flight to Tokyo"
            }
        ]
    }
]

while True:
    response = client.models.generate_content(
        model="gemini-3.5-flash",
        contents=conversation,
        config=types.GenerateContentConfig(
            tools=[flight_search_tool, booking_tool]
        )
    )

    part = response.candidates[0].content.parts[0]

    if not part.function_call:
        print(part.text)
        break

    result = execute_tool(part.function_call)

    conversation.append({
        "role": "model",
        "parts": [part]
    })

    conversation.append({
        "role": "user",
        "parts": [
            {
                "function_response": result
            }
        ]
    })
Enter fullscreen mode Exit fullscreen mode

Use this pattern when Flash needs to call APIs, validate tool responses, and continue until it has enough information to produce a final answer.

FAQ

Is there a free tier for the Gemini 3.5 Flash API?

Yes. Google AI Studio provides daily quotas of about 1,500 requests per day with no credit card required. See the Google AI Studio API key guide.

Does Flash support OpenAI-compatible endpoints?

Yes. Google exposes an OpenAI-compatible shim at:

/v1beta/openai/
Enter fullscreen mode Exit fullscreen mode

You can point an OpenAI SDK at it by setting base_url and using your Gemini API key. The model name remains:

gemini-3.5-flash
Enter fullscreen mode Exit fullscreen mode

Can I use Flash with LangChain or LlamaIndex?

Yes. Both have native Gemini integrations. Pass the model name in the relevant wrapper:

model = "gemini-3.5-flash"
Enter fullscreen mode Exit fullscreen mode

When does Gemini 3.5 Pro ship?

June 2026, according to Google’s launch announcement. Until then, Flash is the only Gemini 3.5 variant available.

What is the max image size for Flash?

The recommended image size is 3072×3072. Larger images are resampled.

For OCR-heavy workflows, see the Gemini 2.0 Flash OCR workflow. The same general patterns apply.

How do I test streaming endpoints in Apidog?

Use the streaming endpoint suffix:

:streamGenerateContent
Enter fullscreen mode Exit fullscreen mode

Apidog renders the SSE chunks as they arrive, which helps debug partial or incomplete responses.

Where can I see API logs?

For AI Studio, check Activity.

For Vertex AI production deployments, use Logs Explorer in Google Cloud.

What to build first

Good starter projects for Gemini 3.5 Flash:

  • PDF Q&A bot: load a PDF into the 1M context window and answer questions
  • Chart-to-JSON pipeline: extract structured data from dashboard screenshots
  • Customer support agent: use function calling against your CRM
  • Code review assistant: analyze multi-file diffs and return severity-rated findings
  • Internal search agent: combine long context with calls to internal APIs

For each project, use the same workflow:

  1. Build the prompt
  2. Wrap it in an SDK or REST call
  3. Validate the response shape with Apidog
  4. Add retries and rate-limit handling
  5. Ship the integration

Top comments (0)