aarhamforensics

Posted on Jun 26 • Originally published at twarx.com

Interactions API Gemini Models Agents: The 2026 Migration Guide

#ai #machinelearning #automation #productivity

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 26, 2026

Google just quietly deprecated the mental model that powered every serious Gemini integration built in the last two years — and most developers haven't noticed yet. The new Interactions API Gemini models agents interface isn't an upgrade to GenerateContent. It's a full regime change that moves state, memory, and agent lifecycle off your servers and onto Google's.

As of this week, the Interactions API Gemini models agents interface hit general availability — replacing the fragmented GenerateContent, Chat, and streaming surfaces with a single unified endpoint that ships with Managed Agents and background execution baked in.

By the end of this article you'll know exactly what changed, how server-side state works, what it costs, and whether migrating your production system off GenerateContent is worth the re-architecture bill.

Google's official announcement of the Interactions API reaching general availability as the primary interface for Gemini models and agents. Source

Coined Framework

The Stateful Sovereignty Trade-off — the architectural inflection point at which offloading conversation state and agent lifecycle to Google's servers accelerates shipping speed but permanently reduces developer-side observability, portability, and cost predictability in agentic workflows

It names the moment your team stops owning the truth of a conversation and starts renting it. You ship faster — but you can no longer fully see, export, or independently price the state your application runs on.

Breaking: What Google Announced and When — The Exact Facts

This is the single most consequential developer-facing change Google has made to the Gemini surface since the model family launched. Here are the confirmed facts, grounded in the official announcement.

Official announcement timeline: blog.google sources and GA date

Google announced that the Interactions API has reached general availability and is now its primary API for interacting with Gemini models and agents. The post — authored by Ali Çevik (Group Product Manager, Google DeepMind) and Philipp Schmid (Developer Relations Engineer, Google DeepMind) — confirms the API launched in public beta in December 2025 and has, in Google's words, 'quickly become developers' favorite way to build applications with Gemini.' Schmid's own developer notes have tracked this surface since the beta.

The GA release locks in a stable schema — the signal production teams have been waiting for — and adds developer-requested capabilities including Managed Agents, background execution, and Gemini Omni (announced as 'soon').

What changed from the legacy GenerateContent and Chat APIs

Google was explicit: 'All of our documentation now defaults to Interactions API and we are working with ecosystem partners to make it the default interface across 3P SDKs and Libraries.' That sentence is the deprecation flag for the old GenerateContent mental model. The fragmented set of endpoints — text generation, multimodal input, streaming, early agent prototypes — collapses into one. Whether you're calling a model or running an agent, you pass a model ID for inference, an agent ID for autonomous tasks, and set background=True for anything long-running. We break down the broader shift in our Gemini API overview.

The Managed Agents launch bundled into the same release

The headline new capability: Managed Agents. Per Google, 'a single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files.' The Antigravity agent ships as the default, and developers can define custom agents with instructions, skills, and data sources.

Dec 2025
Interactions API public beta launch
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1
Unified endpoint replacing 4+ legacy surfaces
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




Antigravity
Default Managed Agent in a provisioned Linux sandbox
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)

The phrase 'our primary interface' isn't marketing fluff. In Google's deprecation playbook, designating a new primary API is the first formal step toward sunsetting the old one — historically a 12–18 month runway.

What the Interactions API Actually Is — Technical Definition

Strip away the launch language and the Interactions API is one idea: your conversation no longer lives in your code. It lives on Google's servers.

The stateful paradigm shift: from request-response to persistent sessions

Under GenerateContent, every turn was a fresh, stateless request. You held the entire conversation history client-side and re-sent the full messages array on every call. The model remembered nothing. Your application was the memory.

The Interactions API inverts this completely. It maintains server-side session state: conversation history, tool-call results, and agent memory are stored and managed by Google's infrastructure. You hold a session token, not the transcript.

GenerateContent made your app the source of truth for every conversation. The Interactions API makes Google the source of truth. That is not a feature change — it is a sovereignty change.

How server-side state differs from client-managed conversation history

This matters more than it sounds. In a typical RAG pipeline built on Pinecone or another vector database, retrieval logic lives entirely with you — you decide what context enters the prompt and when. The Interactions API abstracts memory retrieval behind a managed session object. You gain simplicity. You lose the ability to inspect, version, and independently audit the exact state feeding each turn. I've watched teams realize this tradeoff only after shipping, which is a bad time to realize it. Our AI memory management guide covers the patterns this replaces.

Where the Interactions API sits in the broader Gemini API surface

It's the trunk now, not a branch. The unified endpoint absorbs text generation, multimodal input, streaming, and the early agent prototype endpoints. Architecturally, this is the same move OpenAI made with the Responses API to unify Completions and the Assistants API — except Google's version ships with native agent lifecycle management from day one, not bolted on afterward.

The core shift: GenerateContent re-sends full history every turn (stateless), while the Interactions API persists session state server-side and passes only a session token — the heart of the Stateful Sovereignty Trade-off.

Full Capability Breakdown: Everything the Interactions API Can Do

Here's the complete capability set confirmed in the GA release, plus the control levers that actually matter in production.

Server-side state and multi-turn session management

The defining feature. A session persists across turns server-side, so you stop rebuilding context on every call. For long conversations this is the single biggest payload reduction available — and if you've ever watched a 40-turn support thread balloon a request payload past 80k tokens, you know exactly why this matters.

Background execution: long-running agent tasks without open connections

Set background=True on any call and 'the server runs the interaction asynchronously,' per Google. Critical for any workflow that exceeds a synchronous response window — deep research, multi-step code generation, agentic browsing tasks that take minutes rather than seconds. No more holding an open HTTP connection hostage to a long task.

Combined tool calling: grounding, code execution, and custom MCP tools in one request

Google confirmed tool improvements that let you 'mix built-in tools' in a single request. In practice: attach Google Search grounding, a Python code-execution sandbox, and a custom MCP-registered tool simultaneously — without chaining separate API calls and manually stitching results back together.

Combined tool calling collapses what used to be a 3-call orchestration loop into a single payload. For teams running LangGraph chains just to sequence Gemini tool use, that's an entire layer of your stack becoming optional.

Multimodal input support: audio, video, text, and image in a single interaction

The unified endpoint accepts audio, video, text, and image within one interaction — continuing Gemini's native multimodal posture rather than bolting it on as an afterthought. Gemini Omni is flagged as coming 'soon' in the GA post.

Managed Agents: running verified agents inside the API

A single API call provisions a remote Linux sandbox. The agent reasons, executes code, browses the web, manages files. Antigravity ships as default; custom agents are defined with instructions, skills, and data sources. This is production-grade infrastructure abstraction — you don't manage the sandbox lifecycle at all, which is either a relief or a concern depending on how much you trust Google's ops. If you want ready-made patterns, browse our AI agent library.

Managed Agents turn 'spin up a secure sandbox, give it tools, let it run autonomously' into one API parameter. That used to be a DevOps project. Now it's an agent_id.

Thinking and fidelity control levers

Note on confirmed vs. inferred: Google's GA text explicitly lists Managed Agents, background execution, and tool mixing. The granular 'level of thinking' and multimodal fidelity parameters are consistent with the Gemini 3 control surface but should be verified against current docs before you architect around exact parameter names. The directional point stands: developers get documented levers to trade latency and token spend against reasoning depth — control that orchestration layers like AutoGen can't expose natively.

How to Access and Use the Interactions API: Step-by-Step Guide

Here's the practical path from zero to a working stateful session, plus the cost model you need to budget against before you migrate anything real.

Prerequisites: API key, project setup, and SDK requirements

You need a Google AI Studio project and API key. The session and background-execution primitives require the current Google AI SDK generation — projects pinned to the legacy GenerateContent SDK won't expose session objects. Confirm your SDK version against the GA docs before you start. Don't assume.

Interactions API: From Session Creation to Background Result Retrieval

  1


    **Create session**

Call the Interactions endpoint to open a session. The server returns a session token. State now lives on Google's infrastructure — your app holds only the handle.

↓


  2


    **Attach tools + (optional) agent_id**

Mix Google Search grounding, code execution, and custom MCP tools in one payload. Pass an agent_id (e.g. Antigravity) to run a Managed Agent instead of raw inference.

↓


  3


    **Send turn (set background=True if long-running)**

Pass the session token plus the new user input only — not the full transcript. For tasks over the sync window, background mode runs it asynchronously server-side.

↓


  4


    **Retrieve result via webhook or polling**

Background interactions post results to a callback or are polled by session ID. Incremental agent output streams back through the same session channel.

The full lifecycle — note that the transcript never leaves Google's servers after step 1, which is exactly where the Stateful Sovereignty Trade-off bites.

Creating your first stateful session

Python — Interactions API (illustrative)

Pseudocode aligned to the GA model — verify exact method names in current docs

from google import genai

client = genai.Client(api_key='YOUR_KEY')

1. Open a server-side session — state lives on Google's infra

session = client.interactions.create_session(model='gemini-3-pro')

2. Send a turn — pass only the new input, not the whole history

resp = client.interactions.send(
session=session.id,
input='Summarize Q2 support tickets and flag churn risks',
tools=['google_search', 'code_execution'], # combined in ONE call
)
print(resp.output)

3. Long-running task? Offload it to the server.

job = client.interactions.send(
session=session.id,
agent_id='antigravity', # Managed Agent in a Linux sandbox
input='Build and test a churn-scoring script over the attached CSV',
background=True, # async execution
)

4. Poll or receive via webhook

result = client.interactions.get(job.id)

Need pre-built agent patterns to layer on top of this? Explore our production-tested AI agent templates to skip the boilerplate.

Pricing structure: session, background compute, and agent costs

Confirmed vs. inferred: Google's GA post doesn't publish a complete price sheet. Based on the architecture, expect a three-meter model: (1) standard input/output tokens per turn at Gemini 3 Pro rates, (2) a session-persistence cost for server-side state, and (3) a background-compute meter for agent execution time. Verify exact figures on the official pricing page before forecasting spend — treat any per-hour or per-second numbers you see quoted secondhand as estimates until confirmed. I'd budget conservatively and set hard alerts on your first production workload. Our LLM cost optimization guide walks through how to model this.

The hidden cost line is session persistence. With GenerateContent your only meter was tokens. With the Interactions API, an idle-but-open session can accrue cost. Architect session teardown explicitly — don't leak sessions the way teams once leaked DB connections.

When to Use the Interactions API vs Alternatives — Decision Framework

This is the section your architect actually needs. Migration isn't free, and the right answer depends entirely on your workload.

Use the Interactions API when — five clear production fits

Customer-facing conversational agents with long multi-turn memory.
Long-horizon research or analysis tasks that exceed a synchronous window (use background=True).
Workflows where rebuilding context from a vector DB every turn adds >200ms latency — this was killing us on one pipeline before we moved off the pattern entirely.
Agentic tasks needing a sandboxed environment to run code and browse — Managed Agents remove the infra burden.
Apps combining grounding + code execution + custom tools in one logical step.

Stay on GenerateContent when — four legitimate reasons

Regulated pipelines (healthcare, finance, legal) requiring every token of history in a sovereign, auditable store you control.
On-premise or air-gapped deployments — there's no exportable or self-hosted session store at GA.
Single-shot, stateless inference where session overhead adds cost with zero benefit.
Existing SOC 2 / ISO 27001 scoping that would need re-assessment before state can move off your servers — and your auditors aren't going to move fast on this.

Coined Framework

The Stateful Sovereignty Trade-off in practice

The faster you ship by offloading state, the less you can independently observe, export, or price that state. For consumer chat apps that trade is a bargain; for audited enterprise pipelines it can be a liability you cannot refactor out of.

ADK integration: layer, don't replace

The Agent Development Kit (ADK) is designed to sit on top of the Interactions API, not as an alternative to it. ADK handles agent graph definition and tool registration; the Interactions API handles execution, state, and lifecycle. Teams on LangGraph or CrewAI should decide carefully whether Google's managed session replaces their orchestration layer or merely their Gemini call wrapper. These aren't equivalent substitutions, and conflating them is where I've seen migrations go sideways. If you'd rather start from a working blueprint, our agent template gallery ships patterns for exactly this layering.

Interactions API vs Competitors: Direct Comparison Table

The question every architect is searching: Interactions API vs OpenAI Responses API — and how it stacks against the orchestration frameworks you may already run.

CapabilityGoogle Interactions APIOpenAI Responses APIAnthropic Claude APILangGraph

Server-side session stateYes (managed)Partial (threads + manual run objects)No (client-side)Developer-defined

Managed agent lifecycleYes (Antigravity + custom)Assistants (more manual)NoYou build it

Background async executionYes (background=True)LimitedNo nativeSelf-managed

Combined tools in one callYesYesTool use (sequential)Node-level

Node-level observability / checkpointsLimited (managed)LimitedFull (client-side)Full

Self-hosted / exportable stateNo (at GA)NoYes (you own it)Yes

MCP tool registrationYesYesYesYes

Anthropic's Claude API keeps all state client-side as of mid-2026 — architecturally closer to legacy GenerateContent than to the new model. That's actually a deliberate choice, not a gap. n8n's MCP node can call the Interactions API as an external tool, giving low-code teams stateful Gemini sessions inside visual workflow builders without writing SDK code.

[
▶

Watch on YouTube
Interactions API + Managed Agents: full developer walkthrough
Google DeepMind • Gemini agent architecture

](https://www.youtube.com/results?search_query=Google+Interactions+API+Gemini+agents+walkthrough)

Industry Impact: What the Interactions API Changes for AI Development in 2026

The consolidation of model and agent APIs

If Google successfully pulls stateful session management and agent lifecycle into a first-party managed API, the addressable value of third-party orchestration frameworks shrinks toward complex multi-vendor, multi-model workflows — a real market, but a narrower one. The wrapper-around-Gemini portion of LangChain and AutoGen usage becomes redundant for single-vendor stacks. That's a significant chunk of how those frameworks are actually used today.

The orchestration layer doesn't die in 2026 — it gets squeezed into the gaps managed APIs can't reach: multi-model routing, cross-vendor failover, and human-in-the-loop checkpoints.

What Managed Agents means for the independent agent ecosystem

The Managed Agents launch signals Google is building an agent marketplace inside the Gemini surface — directly competing with OpenAI's GPT-store model and emerging Anthropic agent registries, while offering cloud-sandbox security guarantees neither fully matches today.

The MCP question: compete or complement?

Critically, MCP is supported as a tool-registration mechanism inside the Interactions API. That positions Google as MCP-compatible rather than MCP-replacing — a politically important distinction that preserves goodwill in the open-standard community while still capturing the execution layer. Smart move, frankly.

How the pieces stack: ADK for graph definition, MCP for tool registration, and the Interactions API as the execution-and-state layer underneath — with Managed Agents running in Google's sandbox.

Common Migration Mistakes — and How to Avoid Them

  ❌
  Mistake: Treating it as a drop-in GenerateContent upgrade

Teams swap the endpoint and assume parity. But state now lives server-side — your logging, replay, and audit tooling that read the local transcript suddenly see only a session token. This will burn you at 2am when you need to replay a broken conversation.

✅

Fix: Re-architect observability first. Instrument session IDs, capture inputs/outputs at your boundary, and never rely on Google's session store as your audit log of record.

  ❌
  Mistake: Leaking sessions and absorbing persistence cost

Opening sessions without explicit teardown is the new connection leak. Idle server-side state can accrue persistence charges that never existed under the token-only GenerateContent model.

✅

Fix: Set TTLs, close sessions on conversation end, and monitor active-session count as a first-class metric alongside token spend.

  ❌
  Mistake: Migrating regulated workloads without compliance review

Moving conversation state to Google's servers changes your data-residency and disaster-recovery posture. Existing SOC 2 / ISO 27001 scoping may no longer hold — and your auditors will not be sympathetic if you shipped first and asked later.

✅

Fix: Keep regulated pipelines on GenerateContent with client-side history until you've re-assessed and your auditors sign off on offloaded state.

  ❌
  Mistake: Deleting your LangGraph layer prematurely

Assuming managed sessions fully replace orchestration. You lose node-level checkpoints and human-in-the-loop control that LangGraph provides and managed sessions simply cannot replicate.

✅

Fix: Map your orchestration needs explicitly. Keep LangGraph for branching and checkpointing; let the Interactions API own execution and state only where managed control is acceptable.

Expert and Community Reactions to the Interactions API Launch

The named voices on this launch are Google's own product leads — Ali Çevik and Philipp Schmid of Google DeepMind — who frame it as the company's 'primary interface.' Beyond the announcement, developer reaction has clustered around two poles, with threads on Hacker News and the Google AI developer forum mirroring the split.

Enthusiasm centers on the collapse of boilerplate. Developers building multi-agent systems repeatedly cite the elimination of manual context-rebuilding code as the headline win. The reduction in payload size and orchestration glue is real and immediate — that part isn't overstated.

Concern — flagged loudly by enterprise architects — is the Stateful Sovereignty Trade-off itself. When Google holds your session state, disaster recovery and data-residency posture shift. The most common criticism on developer forums is the absence of a self-hosted or exportable session store, which makes the API a non-starter for air-gapped deployments regardless of its technical merits. That's a fair objection, not a niche one.

Note: specific third-party blog posts circulating in community channels should be read as opinion, not as Google's official position. Ground migration decisions in the official docs.

What Comes Next: Roadmap Signals and Predictions

Confirmed roadmap items from the GA post: Gemini Omni ('soon') and continued work with ecosystem partners to make the Interactions API the default across third-party SDKs. Everything below is reasoned prediction, clearly labeled as such.

2026 H2


  **Gemini Omni lands in the Interactions API**

Google explicitly flagged Omni as 'soon' in the GA announcement — expect fuller real-time multimodal interaction within the unified endpoint this half.

2026 H2


  **OpenAI extends Responses API with managed persistent sessions**

Prediction: to close the differentiation gap, OpenAI mirrors Google's managed-state model — the most logical competitive response given the Responses API already unified Completions and Assistants.

2027 H1


  **A session-export endpoint to neutralize the sovereignty objection**

Prediction: a portable JSON snapshot of server-side state is the single most likely addition — it directly answers the top enterprise objection and would unlock regulated adoption overnight.

2027 Q2


  **GenerateContent deprecation timeline announced (likely at Google I/O)**

Prediction grounded in Google's historical 12–18 month deprecation cycles and the explicit 'primary interface' framing in the GA post.

Coined Framework

The Stateful Sovereignty Trade-off resolves only with export

The trade-off is permanent until Google ships a portable session store. Whoever offers exportable managed state first wins the regulated-enterprise segment — that is the next strategic battleground.

The predicted trajectory from GA to GenerateContent deprecation — each milestone grounded in Google's stated framing and historical deprecation patterns.

What Most People Get Wrong About This Launch

Most coverage frames the Interactions API as 'a cleaner API.' That's the surface read. The real story is that Google moved the source of truth for your application's memory off your infrastructure — and bundled an agent marketplace into the same release so the migration feels like progress rather than a concession of control.

You're not adopting a new endpoint. You're signing a long-term lease on where your application's memory lives — and Google is the landlord.

The counterintuitive truth: the teams that benefit most are early-stage builders who value shipping speed over sovereignty. The teams that should move slowest are the largest, most regulated enterprises — exactly the ones whose procurement teams will be pushed hardest to adopt it. For a deeper look at the trade-offs, see our guide to enterprise AI architecture.

Frequently Asked Questions

What is the Google Interactions API and how is it different from the GenerateContent API?

The Interactions API is Google's new primary, generally-available unified endpoint for both Gemini models and agents, announced after a December 2025 beta. The fundamental difference from GenerateContent is state ownership: GenerateContent was stateless, requiring you to resend the full conversation history client-side on every turn. The Interactions API maintains server-side session state — you pass a session token and only the new input. It also adds background execution (background=True), combined tool calling in one request, and Managed Agents that run in Google-provisioned Linux sandboxes. In short, GenerateContent made your app the memory; the Interactions API makes Google the memory, which speeds development but reduces your observability, portability, and cost predictability.

Is the Interactions API available for all Gemini models including Gemini 3 Pro?

The Interactions API is Google's primary interface for interacting with Gemini models and agents at general availability, and you select a model by passing a model ID for inference. Gemini 3 Pro is the flagship model developers target through this surface. Google's documentation now defaults to the Interactions API across the board, so new model capabilities are expected to surface here first. Always confirm exact model-ID availability and any model-specific feature gating in the current official docs at ai.google.dev, since the model lineup and parameter support evolve. Multimodal input (text, image, audio, video) is supported in a single interaction, and Gemini Omni was flagged as arriving 'soon' in the GA announcement.

How does server-side state management in the Interactions API affect data privacy and compliance?

Significantly. When conversation history, tool results, and agent memory live on Google's servers rather than your infrastructure, your data-residency, disaster-recovery, and audit posture all change. This is the core of the Stateful Sovereignty Trade-off. Existing SOC 2 and ISO 27001 assessments scoped around client-managed history may need re-evaluation before production deployment, because the location and control of sensitive conversation data has shifted to a third party. For healthcare, finance, legal, and any air-gapped or on-premise requirement, the absence of a self-hosted or exportable session store at GA is a blocker — many teams should remain on GenerateContent with client-side history management until Google ships a portable session export endpoint and your auditors approve the new data flow.

Can I use the Interactions API with LangGraph, AutoGen, or CrewAI orchestration frameworks?

Yes, but understand the overlap. LangGraph, AutoGen, and CrewAI provide node-level state control, branching logic, and human-in-the-loop checkpoints that the Interactions API's managed session model cannot replicate. The Interactions API can replace the Gemini call wrapper inside these frameworks, but it does not replace complex multi-agent orchestration. Decide whether you need orchestration features (multi-model routing, checkpoints, cross-vendor failover) — if so, keep LangGraph and let the Interactions API handle execution and state for the Gemini portions. Google's own Agent Development Kit (ADK) is designed to layer on top of the Interactions API, not compete with it. n8n's MCP node can also call the Interactions API as an external tool for low-code teams.

What are Managed Agents in the Gemini API and how do I access them through the Interactions API?

Managed Agents are a GA feature where a single API call provisions a remote Linux sandbox in which an agent can reason, execute code, browse the web, and manage files — without you managing any infrastructure. The Antigravity agent ships as the default, and you can define custom agents with your own instructions, skills, and data sources. To access one, pass an agent_id parameter alongside your session in an Interactions API call instead of a model ID. Google runs the agent in an isolated sandbox and streams incremental results back through the same session channel. For long-running agent tasks, combine this with background=True so the work runs asynchronously and posts results to a webhook or polling endpoint rather than blocking an open connection.

How is the Interactions API priced and what are the costs for background execution and session persistence?

Google's GA announcement does not publish a complete price sheet, so confirm exact rates on the official pricing page. Architecturally, expect a three-meter model: standard input/output token charges per turn at Gemini 3 Pro rates, a session-persistence cost for storing server-side state, and a separate background-compute meter for agent execution time. The key budgeting change from GenerateContent is that tokens are no longer your only cost — idle or unclosed sessions can accrue persistence charges, and background agents bill for compute time. Treat any per-hour or per-second figures quoted by third parties as estimates until verified. Set session TTLs, close sessions on conversation end, and monitor active-session count as a first-class metric to avoid surprise costs.

Should I migrate from GenerateContent to the Interactions API now, and what are the migration risks?

Migrate now if you build customer-facing conversational agents, long-horizon research tasks, or agentic workflows that benefit from managed state and sandboxed execution — the boilerplate reduction is substantial. Hold off if you run regulated, air-gapped, or on-premise pipelines that require sovereign, auditable state, since there is no exportable session store at GA. The main risks: observability loss (your transcript now lives server-side), new persistence costs, compliance re-scoping, and the danger of prematurely deleting orchestration layers like LangGraph that provide checkpointing. Given Google's 'primary interface' framing signals eventual GenerateContent deprecation (likely a 12–18 month runway), plan the migration deliberately rather than rushing — instrument observability at your boundary first, then migrate workload by workload.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He writes from real implementation experience — covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile

This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.