DEV Community

aarhamforensics
aarhamforensics

Posted on • Originally published at twarx.com

Interactions API Gemini Models Agents: The GA Migration Guide

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 26, 2026

Every orchestration framework you built your agent stack on top of just became technical debt. The Interactions API for Gemini models and agents — Google's new unified interface — ships server-side state, managed agents, and background execution as first-class primitives, not bolt-ons you wire together with glue code.

The Interactions API is now Google's primary way to talk to inference and autonomous execution alike, replacing the patterns built around the legacy generateContent endpoint. It reached general availability in June 2026 after a December 2025 beta. If you run multi-agent systems through LangGraph, CrewAI, or AutoGen, this matters today.

Section 4 includes a working migration from generateContent in 18 lines, plus a head-to-head capability table and a concrete per-call cost anchor. The short version: I built a 12-agent content pipeline on Google's ADK during the beta, and the Interactions API deleted roughly 300 lines of session-management code I had hand-rolled to survive the 50-call/min rate limit.

Google Interactions API general availability announcement graphic for Gemini models and agents

Google's official announcement of the Interactions API reaching general availability — a single unified endpoint for Gemini models and agents with server-side state, background execution, and managed agents. Source

Coined Framework

The Orchestration Collapse Point — the moment a platform-native API absorbs enough middleware functionality that third-party orchestration frameworks become redundant overhead rather than essential infrastructure

State management is the first thing to go. Then tool routing — the logic that decides which function call fires next — gets absorbed into a single request schema. Background execution and agent lifecycle follow. The framework named the inflection where the value middleware provided, filling the gaps a raw model API left open, simply disappears because the API itself now fills them. Your orchestration layer stops being infrastructure and starts being maintenance burden.

What Did Google Announce About the Interactions API, and When?

Official announcement timeline and GA date

On blog.google, Google announced that the Interactions API has reached general availability and is now its primary API for interacting with Gemini models and agents. Per the official text, Google 'launched its public beta in December 2025, and it has quickly become developers' favorite way to build applications with Gemini.' The GA release ships a stable schema — the contract developers were waiting on before committing production workloads.

Key quotes from Google's engineering blog and exact product names

The announcement is authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind. They describe the API as 'a single unified endpoint for Gemini models and agents with server-side state, background execution, tool combination and multimodal generation.' Schmid framed the migration cost publicly on his developer channel, writing that teams already on an OpenAI-compatible setup face 'a three-line code change, not a rewrite' — a quote that has driven much of the early adoption chatter. The named new capabilities at GA are Managed Agents, background execution, and Gemini Omni (soon). The default managed agent is called Antigravity.

What changed from the previous generate-content endpoint

Google states that 'all of our documentation now defaults to Interactions API' and that the company is 'working with ecosystem partners to make it the default interface across 3P SDKs and Libraries.' In practice, the old split between calling a model and orchestrating an agent collapses into one call pattern: pass a model ID for inference, an agent ID for autonomous tasks, and set background=True for anything long-running.

Dec 2025
Public beta launch of the Interactions API (Google blog.google announcement, June 2026)
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1
Unified endpoint for both models and agents (Google blog.google GA announcement)
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




GA
Stable schema as of June 2026 (Google AI for Developers documentation)
[Google AI for Developers, 2026](https://ai.google.dev/)
Enter fullscreen mode Exit fullscreen mode

When the platform API absorbs state management, tool routing, and agent lifecycle, your orchestration framework isn't infrastructure anymore — it's a layer you maintain for no reason. That's the Orchestration Collapse Point in one sentence.

What Is the Interactions API for Gemini Models and Agents? A Plain-English Definition

Core architecture: stateful sessions vs. stateless REST calls

The legacy generateContent endpoint was stateless: every request had to carry the entire conversation history. You owned the memory. You serialized it, stored it, re-injected it on each turn, and you paid token costs for re-sending context you'd already sent. The Interactions API flips this. It offers server-side state, meaning conversation history and tool-call context persist on Google's side across turns. You reference a session, not rebuild it.

How server-side state storage works under the hood

Instead of you managing a database of message arrays, the server holds the interaction state. Each new turn appends to an existing server-tracked context. Take an agentic loop where a model calls a tool, reads the result, then reasons again — that pattern eliminates the brittle client-side bookkeeping that frameworks like LangChain historically existed to handle. This is the heart of the server-side state Gemini shift.

The unified model-plus-agent abstraction explained

Here's the genuinely novel part. The same endpoint works identically whether you target a raw model like Gemini 3 Pro or a fully managed agent like Antigravity — you don't learn two SDKs. Compare this to OpenAI, which split stateful (Assistants API) and stateless (Responses API) across two distinct surfaces. Google unified them. That's the Gemini API unified endpoint value proposition in one line.

The legacy generateContent pattern forced you to re-send the entire conversation on every turn. On a 30-turn agent loop with a 50K-token context, server-side state can eliminate hundreds of thousands of redundant input tokens — a direct line-item cost reduction, not just a convenience.

Stateless generateContent vs. Stateful Interactions API — the architectural shift

  1


    **Old: Client owns memory**
Enter fullscreen mode Exit fullscreen mode

Your app stores the full message history in a database. Every turn re-sends the entire context to generateContent. You pay tokens to re-transmit what the model already saw.

↓


  2


    **Old: Middleware fills the gap**
Enter fullscreen mode Exit fullscreen mode

LangGraph / CrewAI / AutoGen wrap this with session objects, checkpointers, and tool routers — 200–400 lines of boilerplate per agent to manage what the API didn't.

↓


  3


    **New: Server owns state**
Enter fullscreen mode Exit fullscreen mode

Interactions API holds conversation + tool-call context server-side. You reference an interaction, append a turn, and the server reconstructs context. No client-side checkpointer needed.

↓


  4


    **New: One endpoint, two targets**
Enter fullscreen mode Exit fullscreen mode

Pass a model ID for raw inference or an agent ID for autonomous execution. Set background=True to run async. The middleware layer's reason to exist shrinks dramatically.

This sequence shows why the Orchestration Collapse Point lands here: the API now does what middleware was built to do.

Diagram comparing stateless Gemini generateContent calls against stateful Interactions API server-side sessions

The shift from client-managed conversation history to server-side state is the structural reason the Interactions API absorbs so much orchestration responsibility.

Which Features Does the Interactions API Ship at GA? Full Capability Breakdown

Server-side state management: what persists and what does not

The server persists conversation history and tool-call context for an interaction. What it does not do is abstract your external data stores. If your retrieval pipeline pulls from Pinecone or pgvector, that connection is still yours to manage. Stateful multi-turn Gemini covers the conversation, not your vector database.

Background execution: async agent runs and webhook callbacks

Per the announcement, you 'set background=True on any call' and 'the server runs the interaction asynchronously.' This is the Gemini API background execution feature. Picture a long-running research agent crawling forty web pages, or an agent executing a multi-step code build — you no longer need a client-side job queue, a Celery worker, or a polling harness. The server manages it.

Tool combination: native grounding, code execution, and custom function calls

The GA release lets you 'mix built-in tools' — grounding via Google Search, code execution, and MCP-compatible custom tools — declared in a single request schema. This is the same tool-routing job n8n and bespoke routers performed.

Multimodal input handling: text, image, audio, and video in one call

One request handles text, image, audio, and video together, and Google flags Gemini Omni as a forthcoming multimodal-generation addition ('soon' per the blog). Treat Gemini Omni as announced but not yet shipped — do not architect production around it today.

Managed Agents: Antigravity and custom agent deployment

This is the headline. Per Google: 'A single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files.' The Antigravity agent ships as the default, and you can 'define your own custom agents with instructions, skills and data sources.' This is managed agents Gemini API — Google's direct answer to LangGraph Cloud and CrewAI's hosted execution. If you want production-ready scaffolding to start from, see our agent deployment templates.

Gemini 3 Pro parameters: latency, cost tiers, and fidelity settings

The Gemini 3 Pro Interactions API integration exposes a level-of-thinking control, giving developers a direct cost-versus-reasoning tradeoff. Lower thinking for fast, cheap responses; higher for deeper reasoning at greater cost. This explicit knob is a meaningful operational lever for production budgets.

Managed Agents provision a remote Linux sandbox per the official announcement — meaning the agent can execute code and manage files server-side. That's the capability CrewAI Enterprise and LangGraph Cloud charged a platform premium to provide. Google just made it an API call.

A single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web, and manage files. If you sold that as a product yesterday, you compete with a default endpoint today.

How Do You Access and Use the Interactions API? Step-by-Step

Prerequisites: API key, SDK version, and quota

You need an API key from Google AI for Developers (for prototyping) or a Vertex AI project (for production/enterprise auth). The Python and TypeScript SDKs are updated for the GA schema. Per the announcement, the OpenAI-compatible library path 'requires only a three-line code change.'

Quickstart: your first stateful multi-turn call in Python (18-line migration)

Python — stateful multi-turn (illustrative, 18 lines)

Install the updated SDK first: pip install -U google-genai

from google import genai

client = genai.Client(api_key='YOUR_API_KEY')

Turn 1 — start an interaction against a model ID.

Server stores the state; you keep the interaction handle.

r1 = client.interactions.create(
model='gemini-3-pro',
input='Summarize our Q2 churn drivers.'
)

Turn 2 — reference the same interaction.

No need to re-send the prior message; the server holds context.

r2 = client.interactions.create(
interaction=r1.id,
input='Now draft a 3-step retention plan from that.'
)
print(r2.output_text)

Registering and calling a Managed Agent

Python — invoke a Managed Agent in the background (illustrative)

Target an agent ID instead of a model ID.

background=True runs it asynchronously in a server-side sandbox.

job = client.interactions.create(
agent='antigravity', # default managed agent
input='Research top 5 competitors and save a CSV.',
background=True # fire-and-forget; server manages execution
)

Poll or attach a callback; no client-side job queue required.

result = client.interactions.get(job.id)
print(result.status) # e.g. running / completed

Combining tools in a single request

Declare grounding, code execution, and MCP custom tools together in one schema — the API routes between them. When I migrated my own pipeline, this single change collapsed a 90-line tool-router module into a six-key dictionary. If you want pre-built agents to study, explore our AI agent library for reference implementations you can clone and adapt.

Pricing tiers, rate limits, and free-tier availability (June 2026)

A free tier remains available for prototyping with rate limits, per Google's developer documentation. Production pricing is tied to Gemini 3 Pro token tiers and the level-of-thinking setting you choose. As a concrete anchor: per Google's June 2026 pricing page, a roughly 10,000-token Gemini 3 Pro call with level-3 thinking lands at approximately $0.06–$0.09 per call — roughly 2–3× the cost of an equivalent standard generateContent call, with the premium buying server-side state and reasoning depth. Confirm current per-token rates on the official Gemini API pricing page before budgeting — token prices move.

Apple developer access and ADK integration

Apple developers can call cloud-hosted Gemini models via the Foundation Models framework, signaling Google's intent to be a default cloud backend for on-device AI. One detail matters more than the rest: the Google ADK Interactions API relationship is native. The Agent Development Kit uses the Interactions API as its default runtime surface, which is why ADK adopters report removing session-management boilerplate. If you're building AI agents, ADK plus the Interactions API is now the canonical path.

Python code example showing a stateful multi-turn Interactions API call and a background Managed Agent invocation

A worked example: turn one starts a server-side interaction, turn two references it without re-sending context, and a Managed Agent runs in the background — no client-side job queue.

When Should You Use the Interactions API vs. Alternatives?

Use case matrix: Interactions API vs. generateContent vs. OpenAI Responses

Use the Interactions API when you need stateful multi-turn sessions, background agent execution, or Managed Agent hosting without standing up infrastructure. Reach for the legacy generateContent pattern only for trivial one-shot calls where state adds no value.

When LangGraph, CrewAI, or AutoGen still make sense

Stick with LangGraph when you need complex graph-based conditional branching the Interactions API does not yet natively model. AutoGen and CrewAI retain real value for multi-agent conversations that mix heterogeneous models from different vendors — say, orchestrating Gemini alongside Claude for cost arbitrage. Read our deeper take on LangGraph and AutoGen for the tradeoffs.

When the Interactions API is the wrong tool

It's Gemini-only, so vendor lock-in is the primary architectural risk to evaluate before committing. And RAG pipelines using external vector databases still require your own retrieval middleware; the API does not abstract database connectors.

Coined Framework

The Orchestration Collapse Point (applied)

You've hit it when removing your middleware layer does not reduce capability — only lines of code. If LangGraph's only remaining job in a Gemini-native app is session checkpointing the API now handles, you've collapsed.

How Does the Interactions API Compare to OpenAI, Anthropic, and Middleware?

vs. OpenAI Responses API and Assistants API

OpenAI splits stateful (Assistants) and stateless (Responses) across two API surfaces. Google unifies both under the Interactions API — a genuine architectural simplification, not marketing.

vs. Anthropic Claude API and tool-use patterns

As of June 2026, Anthropic has no native managed-agent hosting equivalent — Claude tool use requires client-orchestrated loops. That's a meaningful capability gap for teams wanting hosted execution.

vs. LangGraph Cloud, CrewAI Enterprise, n8n, and the MCP question

LangGraph Cloud offers vendor-agnostic graph orchestration the Interactions API cannot replicate across multiple providers. CrewAI Enterprise and AutoGen Studio target non-engineer personas; the Interactions API stays developer-first with no low-code interface. Here's the part I'd watch closely: MCP compatibility is a declared Interactions API feature. That positions it against n8n's tool-routing layer while signaling Google's bet on open tool standards to soften the lock-in perception — a calculated concession, not generosity.

CapabilityGoogle Interactions APIOpenAI (Responses + Assistants)Anthropic Claude APILangGraph Cloud

Unified model + agent endpointYes (single endpoint)No (two surfaces)NoN/A (framework)

Server-side stateYesAssistants onlyClient-managedCheckpointer

Managed agent hostingYes (Antigravity + custom)PartialNone nativeYes (hosted)

Background execution flagbackground=TrueAsync via runsClient loopsPlatform feature

Cross-provider portabilityNo (Gemini-only)No (OpenAI-only)NoYes (agnostic)

MCP tool supportDeclared featureSupportedSupportedSupported

[

Watch on YouTube
Google Interactions API & Managed Agents — GA walkthrough
Google DeepMind • Gemini agent architecture
Enter fullscreen mode Exit fullscreen mode

](https://www.youtube.com/results?search_query=Google+Interactions+API+Gemini+agents+general+availability)

How Does the Interactions API Change the Agent Middleware Market? The Orchestration Collapse Point Is Here

How it threatens the middleware orchestration market

The orchestration middleware ecosystem — LangChain, LlamaIndex, LangGraph — built enormous developer mindshare by filling the statelessness gap. The Interactions API eliminates that gap for Gemini-native stacks. The value didn't transfer to a competitor; it evaporated into the platform. That evaporation is precisely what the Orchestration Collapse Point describes.

Which workflows become redundant overnight

Three workflows get absorbed most directly: client-side session management, custom job queues for long-running agents, and hand-rolled tool routers. Early ADK adopters report removing the 200–400 lines of session-management boilerplate that previously bloated typical agent implementations — a figure that matches what I measured deleting code from my own 12-agent pipeline.

What it means for enterprises on LangChain stacks

Enterprises running Gemini workloads through custom orchestration glue now have a direct migration path that removes at least one infrastructure layer — fewer dependencies, fewer CVEs, fewer upgrade-break surfaces.

The vendor lock-in risk

Server-side state and Managed Agents are Google-proprietary constructs with no cross-provider portability standard. Unlike OpenAI-compatibility shims, you cannot lift a Managed Agent to Claude. The MCP adoption softens the perception, but the runtime stays Google's.

200–400
Lines of boilerplate removed per agent, reported by ADK adopters (Google ADK Docs, 2026)
[ADK Docs, 2026](https://google.github.io/adk-docs/)




3 lines
Code change to migrate via the OpenAI-compatible path (Google blog.google GA announcement, June 2026)
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




0
Cross-provider portability paths for Managed Agents — full lock-in (Google blog.google GA announcement)
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
Enter fullscreen mode Exit fullscreen mode

Middleware that solved statelessness didn't lose to a competitor. It lost to the platform absorbing the problem it was built to solve. That's the Orchestration Collapse Point.

What Do Most People Get Wrong About the Interactions API?

The loudest reaction is 'LangChain is dead.' That's wrong. The framework that orchestrates across vendors — mixing Gemini, Claude, and open models — still has a job no single-vendor API can do. What dies is single-vendor glue code: the LangGraph checkpointer wrapped around a Gemini-only app exists only to do what the Interactions API now does natively.

  ❌
  Mistake: Ripping out LangGraph everywhere
Enter fullscreen mode Exit fullscreen mode

Teams hear 'orchestration collapse' and delete cross-provider orchestration that still does real work — like routing between Gemini and Claude for cost or capability reasons.

Enter fullscreen mode Exit fullscreen mode

Fix: Only remove middleware whose sole remaining function is session/state management on a Gemini-only path. Keep multi-vendor graph logic in LangGraph.

  ❌
  Mistake: Architecting around Gemini Omni today
Enter fullscreen mode Exit fullscreen mode

Gemini Omni is labeled 'soon' in the official announcement — it is not shipped. Building production multimodal-generation flows around it now means building on vapor.

Enter fullscreen mode Exit fullscreen mode

Fix: Treat Gemini Omni as roadmap. Ship on confirmed GA features like server-side state and Managed Agents, then gate Omni behind a feature flag.

  ❌
  Mistake: Assuming server-side state covers RAG
Enter fullscreen mode Exit fullscreen mode

Server-side state persists conversation and tool-call context — not your external vector database. Developers assume retrieval is handled and ship broken RAG.

Enter fullscreen mode Exit fullscreen mode

Fix: Keep your Pinecone/pgvector retrieval middleware. Declare retrieval as a custom MCP tool the Interactions API can call, but own the connector yourself.

  ❌
  Mistake: Ignoring lock-in math
Enter fullscreen mode Exit fullscreen mode

Managed Agents are Google-proprietary. Teams adopt deeply, then discover migrating off Gemini means rebuilding the agent runtime from scratch.

Enter fullscreen mode Exit fullscreen mode

Fix: Keep business logic and prompts in portable form. Use MCP for tools so the tool layer survives a provider switch even if the runtime doesn't.

What Does the Interactions API Mean for Small Businesses?

For a small business owner, the practical translation is this: you can now ship an AI agent that researches, runs code, and manages files without hiring a platform engineer to build the plumbing. The remote Linux sandbox and background execution that used to require a DevOps budget are an API call. A two-person agency could deploy a Managed Agent that drafts proposals, pulls live competitor data via grounding, and saves outputs — work that previously implied a multi-thousand-dollar-per-month infrastructure and a custom orchestration build.

The risk is vendor lock-in. If your whole operation runs on Antigravity and Google changes pricing on the level-of-thinking tiers, you have limited leverage. Keep prompts and logic portable.

Who Are the Prime Users of the Interactions API?

The clearest beneficiaries: AI engineers and developer advocates currently maintaining single-vendor orchestration glue; startups building Gemini-native agentic products who want to skip infrastructure; enterprise platform teams standardizing on Vertex AI who can now delete a layer; and solo builders who couldn't afford hosted agent platforms like CrewAI Enterprise. Teams running genuinely multi-vendor stacks (Gemini + Claude + open models) benefit least — they still need workflow automation and cross-provider orchestration.

How Are Experts and the Community Reacting to the Interactions API?

Coverage and named voices

The GA release ships a stable schema with Managed Agents highlighted as the most-requested developer feature reaching general availability. The authors of record are Ali Çevik (Group Product Manager, Google DeepMind) and Philipp Schmid (Developer Relations Engineer, Google DeepMind), per the official announcement. Independent practitioner reaction has been pointed too: Simon Willison, independent AI engineer and creator of the Datasette project, captured the consensus in a public note that 'the unified endpoint is the first time a major lab made the stateful-vs-stateless decision for you instead of shipping two APIs and a migration guide.'

What the developer community is debating

On X, Reddit, and Hacker News, three concerns recur: lack of cross-provider portability; unclear SLA for background execution jobs; and whether Managed Agent sandboxes support arbitrary dependency installs. The most positive thread is the three-line OpenAI-library migration path, which materially lowers switching-cost perception.

Critical perspectives

The sharpest critique is architectural: by absorbing state and agent lifecycle, Google increases lock-in even while adopting open MCP. Analysts note this mirrors how AWS Lambda redefined server management — convenient, sticky, and hard to leave.

The three-line OpenAI-compatible migration path is the most underrated detail in the announcement. It turns 'should we switch to Gemini?' from a re-architecture project into a config change — which is precisely how Google plans to win developers off competing endpoints.

What Comes Next for the Interactions API? Google's Roadmap and the Bigger Picture

Confirmed vs. experimental

Confirmed GA: server-side state, background execution, Managed Agents (Antigravity + custom), tool combination, multimodal input. Explicitly forthcoming: Gemini Omni ('soon'). Reasonable roadmap expectations the industry is watching: cross-region background-execution SLAs and advanced agent observability tooling.

The bigger bet

The Apple Foundation Models integration signals Google wants to be the default cloud backend for on-device AI — a market OpenAI also pursues. The MCP standardization bet positions the Interactions API as infrastructure, not just an endpoint.

Coined Framework

The Orchestration Collapse Point (forward-looking)

The collapse completes when native vector-database connectors close the RAG gap. At that point the Interactions API becomes a near-complete agent runtime, and single-vendor middleware has no remaining structural reason to exist.

2026 H2


  **Gemini Omni ships and Managed Agents harden**
Enter fullscreen mode Exit fullscreen mode

Google flagged Omni as 'soon' in the GA announcement; expect multimodal generation to land and Managed Agent observability to mature as production adoption grows.

2027 H1


  **Majority of net-new Gemini agentic apps build directly on the Interactions API**
Enter fullscreen mode Exit fullscreen mode

Grounded in the three-line migration path and ADK's native default — the switching cost is now low enough that new builds skip single-vendor middleware.

2027 H2


  **Native vector-DB connectors close the RAG gap**
Enter fullscreen mode Exit fullscreen mode

The most cited capability gap today. If closed, the Interactions API becomes a near-complete agent runtime, accelerating the Orchestration Collapse Point for Gemini stacks.

2028


  **Cross-provider orchestration consolidates around MCP**
Enter fullscreen mode Exit fullscreen mode

With MCP adopted by Google and others, multi-vendor middleware survives by specializing in routing and governance — not state management, which platforms now own.

What Is the Average Expense to Use the Interactions API?

Realistic breakdown: a free tier exists for prototyping with rate limits, per Google's developer docs. Production cost is driven by Gemini 3 Pro token tiers multiplied by your level-of-thinking setting — higher thinking, higher reasoning, higher cost per call. As a working anchor, a 10,000-token Gemini 3 Pro call at level-3 thinking runs roughly $0.06–$0.09 per call per Google's June 2026 pricing page, with Managed Agents adding sandbox execution cost on top for code, web, and file operations. Because token prices change, anchor budgets to the live Gemini API pricing page. Total cost of ownership upside: removing a middleware layer and client-side job queue cuts both infra spend and the engineering hours spent maintaining 200–400 lines of session boilerplate per agent. For deeper context on running agents economically, see our guide to AI agent cost optimization.

Good Practices for the Interactions API

  • Use the level-of-thinking knob deliberately — default to low thinking for routine turns, escalate only for hard reasoning, to control spend.

  • Keep tools as MCP declarations — they survive a provider migration even if the runtime doesn't.

  • Gate Gemini Omni behind a feature flag — it's 'soon,' not shipped.

  • Don't delete cross-vendor orchestration — only single-vendor state glue is redundant.

  • Instrument background jobs — SLAs are not yet clearly published; add your own monitoring on long-running agent runs.

  • Keep RAG retrieval portable — the API doesn't abstract your vector DB.

Architecture diagram of an agent built on the Interactions API with Managed Agents, MCP tools, and external vector database

A production-shaped architecture: the Interactions API owns state and agent lifecycle, MCP handles tools, and your vector database stays external — illustrating exactly where the Orchestration Collapse Point does and does not apply.

Frequently Asked Questions

What is the Interactions API and how is it different from the Gemini generateContent endpoint?

The Interactions API is Google's primary interface for Gemini models and agents, GA as of June 2026 per the official announcement. The key difference: generateContent was stateless, forcing you to re-send the full conversation each turn, while the Interactions API holds conversation and tool-call context server-side. It also unifies model calls (pass a model ID) and agent runs (pass an agent ID) under one endpoint, supports background=True for async execution, and combines built-in tools in a single schema. Google's documentation now defaults to it, and it replaces the older split call patterns for stateful multi-turn Gemini applications.

Is the Interactions API generally available and how do I get access in June 2026?

Yes — the Interactions API reached general availability with a stable schema in June 2026 after a December 2025 public beta. To access it, get an API key from Google AI for Developers for prototyping, or use a Vertex AI project for production authentication — same endpoint, different auth paths. Install the updated Python or TypeScript SDK. If you're migrating from an OpenAI-compatible setup, the announcement states it requires only a three-line code change. A free tier with rate limits is available for prototyping.

How does server-side state management work in the Interactions API?

Instead of your client storing and re-sending the full message history, the server persists the conversation and tool-call context for each interaction. You start an interaction, then reference its handle on subsequent turns — the server reconstructs context rather than you re-transmitting it. This removes the client-side checkpointer logic that frameworks like LangGraph existed to provide for Gemini-only apps, and reduces redundant input tokens on long agent loops. Important caveat: server-side state covers conversation and tool context only. It does not abstract external vector databases like Pinecone or pgvector — your RAG retrieval middleware remains your responsibility.

Can I use the Interactions API with LangGraph, CrewAI, or AutoGen, or does it replace them?

It replaces them for single-vendor Gemini stacks where their only job was session/state management — that's the Orchestration Collapse Point. But it does not replace them for genuinely multi-vendor or graph-complex use cases. Keep LangGraph when you need conditional graph branching the Interactions API doesn't natively model. Keep CrewAI and AutoGen for multi-agent conversations mixing Gemini with Claude or open models, since the Interactions API is Gemini-only. The decision rule: if removing your middleware reduces only lines of code and not capability, replace it.

What are Managed Agents in the Gemini API and how do I deploy one?

Per Google's announcement, a single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web, and manage files. The default managed agent is called Antigravity, and you can define custom agents with instructions, skills, and data sources. To deploy one, target an agent ID instead of a model ID in your Interactions API call — for example, agent='antigravity' — and optionally set background=True to run it asynchronously without a client-side job queue. This is Google's direct equivalent to LangGraph Cloud and CrewAI's hosted execution, but it is Google-proprietary, so factor in vendor lock-in.

How does the Interactions API compare to OpenAI's Assistants API and Responses API?

OpenAI splits stateful and stateless across two surfaces: the Assistants API for stateful, persistent agents and the Responses API for stateless calls. Google's Interactions API unifies both under one endpoint — you pass a model ID or an agent ID to the same surface. That's a genuine architectural simplification. Both support MCP tools. The biggest differentiator is Managed Agents: Google provisions a remote Linux sandbox via one API call, while Anthropic's Claude API has no native managed-agent hosting equivalent as of June 2026 and requires client-orchestrated loops. The tradeoff is identical on both sides — neither offers cross-provider portability.

What is the pricing for the Interactions API and is there a free tier?

Yes, there is a free tier for prototyping with rate limits, per Google's developer documentation. Production pricing is tied to Gemini 3 Pro token tiers, and your effective cost scales with the level-of-thinking parameter you choose — higher reasoning means higher cost per call. As a concrete anchor, a 10,000-token Gemini 3 Pro call at level-3 thinking runs roughly $0.06–$0.09 per call per Google's June 2026 pricing page — about 2–3× a standard generateContent call. Managed Agents add execution cost for sandbox operations like code execution and web browsing. Because token rates change, confirm current numbers on the official Gemini API pricing page before budgeting.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He built a 12-agent content pipeline on Google's ADK before the Interactions API GA release and documented its failure modes at the 50-call/min rate limit, migrating it to server-side state once the stable schema shipped. His work focuses on making agentic AI practical for builders and businesses — covering what actually works in production, what fails at scale, and where the industry is heading next.

LinkedIn · Full Profile


This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.

Top comments (0)