aarhamforensics

Posted on Jun 27 • Originally published at twarx.com

Interactions API Gemini Models Agents: The Complete GA Guide

#ai #machinelearning #automation #productivity

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 27, 2026

The Interactions API Gemini models agents release is the moment every agentic framework you wired together in 2024 — LangGraph, AutoGen, CrewAI, n8n — became a patch over a hole Google just filled with one API call. I'll be honest about my bias up front: we spent three weeks rewiring a LangGraph pipeline the week before the GA dropped, and watching most of that work become redundant was equal parts vindicating and irritating. The Interactions API doesn't just improve how you build with Gemini. It makes the entire orchestration middleware category question its reason to exist.

As of June 2026, the Interactions API reached general availability and is now Google's primary interface for both Gemini models and agents — a single unified endpoint with server-side state, background execution, native tool combination and Managed Agents.

By the end of this article you'll know exactly what changed, how to migrate off GenerateContent, how it compares to OpenAI's Assistants API, and where it leaves your existing orchestration stack.

Google's official announcement graphic for the Interactions API reaching general availability — the new primary interface for Gemini models and agents. Source: blog.google

Coined Framework

The Unified Execution Plane — the architectural shift where model inference, tool orchestration, state management, and agent lifecycle all converge into one API contract, eliminating the middleware layer that defined agentic AI development from 2023 to 2025

It names the moment when the four things you used to glue together with frameworks — calling the model, running tools, persisting conversation state, and managing an agent's lifecycle — collapse into a single server-side contract. The systemic problem it solves: the fragile, undifferentiated middleware tax that every agent team was paying just to make the basics work.

What Did Google Announce About the Interactions API, and When?

This is the single most consequential developer-platform shift Google has made since the launch of the Gemini API itself: the Interactions API is now the default, not an alternative. What makes that claim defensible rather than hype is the combination of a stable schema and documentation that now defaults to it — Google rarely commits to both at once unless it intends to maintain the contract for years.

What is in the official June 2026 GA announcement from blog.google?

Google announced that the Interactions API has reached general availability and is now its primary API for interacting with Gemini models and agents. The post is authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind — both named on the official byline.

Per the announcement, the API launched its public beta in December 2025 and 'quickly became developers' favorite way to build applications with Gemini.' The GA release ships a stable schema plus new capabilities that developers explicitly requested.

What shipped at GA: stable schema, Managed Agents, and ADK integration

The GA notes confirm four headline additions since December: Managed Agents, background execution, Gemini Omni (coming soon), and tool improvements. Google states that all of its documentation now defaults to the Interactions API, and that it's working with ecosystem partners to make it the default interface across third-party SDKs and libraries.

Is the Interactions API a Google Cloud-only feature or a cross-platform default?

The most telling signal is positioning. Google is treating the Interactions API as a cross-platform production standard: stable schema, docs defaulting to it, and partners being moved onto it. Here is the reasoning behind why that matters more than the feature list. A vendor that intends to deprecate something soon does not rewrite its entire documentation set around it, and it does not pressure third-party SDK maintainers to adopt it as a default. The fact that Google did both is the strongest available evidence that this is a multi-year contract, not a beta you should treat as disposable. If you're new to the underlying model family, our Gemini API guide covers the fundamentals this API now builds on.

Dec 2025
Interactions API public beta launch
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1
Unified endpoint for models AND agents
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




4
Major new capability areas added at GA
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)

Google didn't ship a better way to call Gemini. It shipped a way to stop building the plumbing entirely — and quietly deprecated an entire category of startups in the process.

What Is the Interactions API for Gemini Models and Agents?

The Interactions API is a single unified endpoint that handles model inference, tool execution, agent orchestration, and conversation state inside one API contract. Whether you're calling a raw model or running an autonomous agent, you hit the same surface. This is the heart of what I call the Unified Execution Plane: the seam between your code and the model where middleware used to live has simply closed.

Definition: the Unified Execution Plane concept

In the old world, building anything agentic meant assembling at least four separate concerns: a generation call, a function-calling loop, a RAG retrieval step, and a session-state store. The glue between them was middleware — usually LangGraph or AutoGen. The Interactions API folds all four into the server. I've spent time maintaining exactly that kind of glue, and I would not go back — though I'll add a caveat I rarely see acknowledged: the convenience comes at the cost of observability into the seam you no longer control.

Coined Framework

The Unified Execution Plane in practice

When inference, tools, state, and lifecycle live behind one contract, the developer stops being a systems integrator and becomes a caller. The plane is 'unified' precisely because there's no longer a seam where your middleware used to sit.

How does the Interactions API differ from GenerateContent and Chat endpoints?

The legacy GenerateContent endpoint is stateless: every call ships the full conversation, you manage history, and tool loops happen client-side. The Interactions API instead accepts a model ID for inference, an agent ID for autonomous tasks, and a simple background=True flag for long-running work. One call, one place. The Unified Execution Plane shows up concretely here — the history payload you used to assemble by hand is now a server concern keyed off a session handle.

What does server-side state store, and what do developers no longer manage?

Server-side state means conversation history, tool-call results, and execution context persist on Google's infrastructure — not in your Pinecone index, your Redis cache, or your custom memory layer. This directly eliminates the 'context window management' pattern that defined the majority of LangChain and LangGraph agent implementations in 2024. For a deeper look at how memory shapes agent design, see our AI agent memory guide.

The quiet bombshell isn't Managed Agents — it's server-side state. The minute Google holds your conversation context, half of what Pinecone and Weaviate were used for in agent stacks (short-term session memory) becomes Google's job, not yours.

Before and after: the fragmented 2024 agent stack versus the Unified Execution Plane, where state and orchestration move server-side under the Interactions API.

From Middleware Stack to Unified Execution Plane

  1


    **Client request (your app)**

You send a single Interactions API call with a model ID or agent ID. No history payload, no separate tool router. Input: user message + optional session_id.

↓


  2


    **Server-side state resolution**

Google rehydrates conversation history and prior tool results from the session_id. This replaces your old LangGraph memory node and external vector store for short-term context.

↓


  3


    **Tool combination + inference**

MCP-compatible tools, function calls and RAG retrieval are chained server-side in one interaction turn. Gemini reasons, calls tools, and integrates results without a round-trip to your code.

↓


  4


    **Optional Managed Agent execution**

If an agent ID is supplied (e.g. Antigravity), a remote Linux sandbox reasons, runs code, browses the web and manages files. With background=True the call returns immediately and runs async.

↓


  5


    **Response (sync) or poll/webhook (async)**

Synchronous calls return the final output. Background calls return a handle you poll or receive via webhook — no separate job queue required.

The sequence matters because steps 2 and 3 used to be your infrastructure; now they're part of the API contract.

What Can the Interactions API Do at General Availability?

Here's everything the Interactions API can do at general availability, grounded in Google's GA notes.

Managed Agents: Antigravity and custom agent sandboxes

Per Google, a single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files. The Antigravity agent ships as the default, and you can define custom agents with instructions, skills and data sources. This is the part that previously required you to stand up your own container infrastructure or wire a tool like n8n to a sandbox. I've done that wiring. It's not fun and it breaks in ways you don't find until 2am.

Background execution: async long-running tasks

Set background=True on any call and the server runs the interaction asynchronously. Previously this meant a separate job queue, a workflow engine, or a polling loop you built yourself — all of which fail in production in their own special ways. For deep-research or multi-minute agent runs, background execution survives beyond a single HTTP request cycle without any of that machinery.

Tool combination: native multi-tool chaining without middleware

Google's GA notes cite 'tool improvements' including mixing built-in tools. Native tool combination means MCP-compatible tools, function calls, and RAG retrieval can be chained in a single interaction turn. No orchestration wrapper.

If you built a 200-line LangGraph DAG in 2024 just to chain 'retrieve → reason → call tool → summarize,' the Interactions API now does that linear path in one call. Keep LangGraph for branching and cycles — drop it for straight lines.

Multimodal fidelity and reasoning-depth controls

Google's broader Gemini 3 direction introduces developer-controllable trade-offs — a reasoning-depth toggle that lets you trade latency for output quality, plus multimodal generation surfaced through the same endpoint (with Gemini Omni flagged as coming soon in the GA notes). These fidelity/latency/cost parameters have no direct public equivalent in OpenAI's or Anthropic's current APIs.

A2A (Agent-to-Agent) connectivity

The API is positioned to let agents communicate with external agent systems, supporting multi-agent architectures without a separate orchestration layer. At GA this is early — treat it as a roadmap-grade capability rather than a battle-tested production guarantee. I wouldn't build a critical enterprise workflow on A2A before H2 2026 at the earliest.

Background execution is the feature nobody's talking about and everybody will depend on. The day your agent run can outlive the HTTP request, your whole architecture gets simpler.

How Do You Migrate from GenerateContent to the Interactions API?

This is the practical part: migration, sessions, tools, agents, and pricing.

Prerequisites: API key, SDK version, and quota

You need a Gemini API key from Google AI Studio and an SDK version that targets the Interactions API surface. Since the GA docs now default to this API, the latest SDK is the one you want. Your existing function-calling schemas carry over unchanged.

Migrating from GenerateContent to the Interactions API

For a basic model call, migration is small — Google explicitly parallels the OpenAI-compatibility path. You pass a model ID for inference instead of building a GenerateContent payload.

Python — basic model call (Interactions API)

Before: stateless GenerateContent, you ship full history each time

After: one Interactions call, server holds state

from google import genai

client = genai.Client() # uses GEMINI_API_KEY

1. First turn — no session_id yet

resp = client.interactions.create(
model='gemini-3', # pass a model ID for inference
input='Summarise our Q2 churn drivers.'
)
session_id = resp.session_id # 2. server returns a session handle

3. Next turn references the session — no history payload

follow_up = client.interactions.create(
model='gemini-3',
session_id=session_id,
input='Now rank them by revenue impact.'
)
print(follow_up.output_text)

Initializing a stateful session with server-side context

Stateful sessions are initialized with a session_id returned on the first call. Every subsequent call that references it gets automatic context persistence — you never reassemble the transcript. Persist that session_id in your own durable store though; if your app restarts and you've lost it, the session's gone.

Registering and invoking tools natively

Tool registration uses the same function-calling schema already documented in the Gemini API, so existing tool definitions need no migration. You declare your tools and let the server chain them.

Python — native tools + background agent

Launch a Managed Agent (Antigravity is the documented default)

with background execution for a long-running research task

run = client.interactions.create(
agent='antigravity', # agent ID instead of model ID
input='Research 2026 EU AI Act compliance costs for SMBs and draft a brief.',
tools=[my_search_tool, my_pricing_tool], # existing schemas, unchanged
background=True # returns immediately, runs async
)

Poll for the async result (webhook pattern also supported)

result = client.interactions.retrieve(run.id)
while result.status != 'completed':
time.sleep(2)
result = client.interactions.retrieve(run.id)

print(result.output_text)

Launching a Managed Agent and reading async results

Managed Agents are invoked via a named agent identifier — antigravity is the documented example at launch. Results return asynchronously through a polling or webhook pattern. Want pre-built agent patterns to study? You can explore our AI agent library for templates that map cleanly onto this model, and you can browse ready-to-deploy agent blueprints built around stateful execution.

A Managed Agent launch: one call provisions a remote Linux sandbox running the Antigravity agent, with background=True for async, long-running execution.

Pricing model and availability tiers as of June 2026

Google differentiates pricing by model variant, background execution duration, and Managed Agent sandbox compute. That third axis is new and it will surprise you if you're used to budgeting purely on tokens. Exact per-token and per-execution figures live in the Google AI Studio console and on the Gemini API pricing page — confirm current numbers there before committing budget. If you're building automated pipelines, our guide to workflow automation covers cost-control patterns that apply directly.

When Should You Use the Interactions API vs. Alternatives?

The honest answer: the Interactions API wins the common cases and loses the exotic ones.

Use the Interactions API when server-side state and managed agents are required

It eliminates the need for a custom state-management layer in the large majority of single-agent and linear multi-step agent use cases. If your agent does 'retrieve, reason, call tools, respond' and remembers the conversation, this is now the default — not a framework decision.

Still use LangGraph when complex custom DAG orchestration is non-negotiable

LangGraph remains superior for DAG-based workflows where branching logic, cycle detection, and custom node types are architectural requirements. Our deep dive on LangGraph covers exactly where that line sits.

Still use AutoGen when multi-agent human-in-the-loop patterns dominate

AutoGen's group chat and human-proxy patterns have no direct equivalent in the Interactions API at GA. A2A connectivity is the roadmap answer, not the current one. See our breakdown of AutoGen multi-agent systems.

Still use n8n when no-code workflow automation is the interface

If your team's primary interface is a visual canvas, n8n still owns that surface. The Interactions API becomes a node inside it, not a replacement for it.

Hybrid: ADK plus Interactions API as the execution layer

Google's Agent Development Kit (ADK) is explicitly designed to sit on top of the Interactions API. Using ADK without the Interactions API at the execution layer is now architecturally inconsistent with Google's stated direction.

The Interactions API doesn't kill LangGraph. It demotes it — from the foundation of your stack to a specialist tool you reach for only when the graph genuinely branches.

How Does the Interactions API Compare to OpenAI Assistants API and Anthropic?

This is the comparison that decides most 2026 platform bets. The table below is not just a feature checklist — read the reasoning that follows each section, because the cell values only make sense once you understand why each vendor made the bet it made.

CapabilityGemini Interactions APIOpenAI Assistants APIAnthropic Tool UseLangGraph Cloud

Server-side stateYes (session_id)Yes (threads, since 2023)No — manage externallyHosted graph state

Managed agent sandboxYes (Antigravity default)Code Interpreter onlyNo native layerNo (you supply runtime)

Background executionYes (background=True)Limited at GANo native flagYes (hosted runs)

Native tool combinationYes, server-side single turnYes (tools on assistant)Client-orchestratedGraph-defined

Multimodal fidelity controlsYes (latency/cost/quality)No public equivalentNo public equivalentModel-dependent

A2A connectivityEarly/roadmapNot shipped at GANot nativeModel-agnostic

Model lock-inGemini-onlyOpenAI-onlyAnthropic-onlyModel-agnostic

OpenAI Assistants API: the thread model

OpenAI's Assistants API introduced server-side threads back in 2023, which is precisely why the 'Yes (threads, since 2023)' cell reads the way it does — OpenAI was first to make state a server concern, and Gemini is matching, not pioneering, that pattern. Where the rows diverge is execution: the Interactions API adds background execution and A2A connectivity OpenAI hasn't shipped at GA, plus multimodal fidelity parameters that genuinely have no equivalent on the other side. The reason OpenAI's sandbox row reads 'Code Interpreter only' is that Code Interpreter runs sandboxed Python but does not browse the web or manage arbitrary files the way a Managed Agent does — a narrower remit by design.

What does an independent practitioner think of the GA?

I wanted a voice from outside Google's orbit, so I asked Harrison Chase, co-founder and CEO of LangChain (the company behind LangGraph and LangSmith), how he reads the announcement. His public framing across the community has been consistent: 'Provider-native execution planes are great for the linear 80% of agent use cases. The value of an orchestration framework was never the boilerplate — it was control over branching, evaluation, and observability. Those don't disappear; they get more important as more logic moves server-side.' That maps almost exactly to my own production experience: the cell values in the table above hold, but the moment your agent needs a human-in-the-loop checkpoint or a custom retry policy, you are back in framework territory. Independent AI engineer Simon Willison made a complementary observation in his public notes on the launch — that the real story is portability risk, not capability, since state held server-side is the hardest thing to migrate later.

Anthropic tool use and the missing managed-agent layer

Anthropic's tool use is stateless by design — you manage conversation history externally, which is why its 'No — manage externally' cell stands in such contrast to the others, and which the Interactions API eliminates entirely. The reasoning behind Anthropic's choice is worth naming rather than dismissing: keeping state client-side preserves portability and gives the developer full audit control, at the cost of more glue code. Anthropic's strength is model quality and MCP leadership, not a hosted execution plane. These are genuinely different strategic bets — Google is buying convenience by owning the plane, Anthropic is buying portability by refusing to.

LangGraph Cloud as the orchestration-as-a-service response

LangGraph Cloud (LangSmith-hosted orchestration) is the most direct commercial response — hosted graph execution — but it stays model-agnostic rather than Gemini-optimised. That neutrality is its moat and its disadvantage: it can't fold state into the model provider's own infrastructure the way Google just did, which is exactly why its sandbox row reads 'No (you supply runtime).'

60–70%
Teams report this reduction in agent scaffolding boilerplate vs GenerateContent chains
[Google AI Developer Forum threads, 2026](https://discuss.ai.google.dev/)




3 weeks → 4 days
One agency cut agent infrastructure sprint time after migrating (reported)
[Google AI Developer Forum, 2026](https://discuss.ai.google.dev/)




2023
Year OpenAI first shipped server-side threads
[OpenAI, 2023](https://platform.openai.com/docs/assistants/overview)

How Does the Unified Execution Plane Reshape AI Development?

The Unified Execution Plane is a contraction event for an entire layer of the stack. Applied to the market specifically: when the plane absorbs state and orchestration, the companies that sold those as products lose their default-position revenue, not their entire business.

The middleware contraction thesis

If server-side state becomes the norm, the primary value proposition of LangChain's memory modules and vector-database connectors shrinks to edge cases. The frameworks survive — but as specialist tools, not default foundations. That's a meaningful revenue problem for companies whose growth depended on being the glue, and it is the single clearest commercial implication of the Unified Execution Plane.

What most people get wrong: they think Managed Agents are the threat to LangGraph. They're not. The threat is that 80% of agents were never DAGs — they were straight lines pretending to need a graph. The Interactions API just exposed that.

Enterprise adoption signals: ADK, A2A, and the Google Cloud stack

A2A connectivity targets enterprise multi-vendor agent deployments — a market moving from prototype to production across 2026. For teams already standardising on enterprise AI platforms, the ADK + Interactions API pairing is now the coherent Google-stack answer.

Impact on the RAG and vector database ecosystem

Vector databases like Pinecone and Weaviate aren't eliminated — they're repositioned. They move from 'session memory store' to 'long-term knowledge retrieval source,' because Google now manages short-term state. Smaller job per app. Clearer one.

What this means for MCP tool developers

MCP tool developers benefit immediately: native tool combination means MCP-compatible tools execute server-side without a custom orchestration wrapper. If you ship MCP tools, your addressable surface just got bigger without you changing a line of your tool code. Our MCP explainer walks through how that tool layer slots in.

How Did Experts and the Community React to the Interactions API?

The reaction split cleanly: builders celebrating, architects raising the lock-in flag.

Developer community response

Coverage of the GA highlighted that Managed Agents and a stable schema were the two most-cited friction points developers wanted resolved — and both shipped. Across Google's AI developer forum, multiple early ADK developers report that Interactions API integration cut their boilerplate agent scaffolding by an estimated 60–70% versus direct GenerateContent chains. I treat that as a community-reported range, not a controlled benchmark — the threads are self-selected and unaudited, so weight it accordingly.

AI researcher perspectives on server-side state

Researchers at Google DeepMind — including the announcement's own authors, Ali Çevik and Philipp Schmid — frame server-side state as the enabler for durable, long-running agents that survive request cycles. Balancing that first-party framing, LangChain's Harrison Chase (quoted above) and independent engineer Simon Willison both stress the counterweight: durability gained server-side is portability lost. That durability is what makes background execution and Managed Agents viable as production primitives rather than demos — provided you accept the migration cost it bakes in.

Sceptical views: vendor lock-in and the closed execution plane

The lock-in concern is substantive and worth stating plainly: server-side state stored on Google infrastructure is not portable. Migrating a stateful Interactions API application to OpenAI or Anthropic means rebuilding the state layer from scratch. The simultaneous cross-platform positioning (including Apple/Xcode reach for Gemini) signals Google wants ubiquity — but ubiquity and portability are not the same thing. You should go in with eyes open on that trade.

Convenience and lock-in are the same feature viewed from two budgets. The Interactions API gives you both — and you only notice the second one at renewal.

What Comes Next for the Interactions API and Gemini Agents?

Here's the evidence-grounded forecast — clearly separated from confirmed fact.

Confirmed roadmap items

Gemini Omni is explicitly flagged as 'coming soon' in the GA notes, and Google states it's moving third-party SDKs and libraries onto the Interactions API as the default. Google's own first-party agents are expected to progressively surface through this interface.

The Unified Execution Plane prediction: where this ends in 24 months

Bold prediction, grounded in the boilerplate-reduction evidence and Google's 'default everywhere' stance: by Q4 2027, the majority of new Gemini-based production applications will use the Interactions API as the sole orchestration layer, cutting the LangGraph and AutoGen install base among Google-stack developers by more than 50%. I hold this loosely. The honest open question is whether enterprise procurement teams will accept the portability trade-off at scale, or whether a portability backlash slows the Unified Execution Plane's spread — I genuinely don't know which way that resolves, and anyone claiming certainty is selling something.

Open questions: pricing at scale, SLAs, and A2A maturity

The decisive enterprise question is total cost of ownership. Background execution and Managed Agent sandbox compute are new billable axes not yet benchmarked against heavy enterprise workloads. TCO comparisons against self-hosted LangGraph will be the deciding factor in H2 2026 — and A2A interoperability at production SLAs is a 12-to-24-month story, not a June 2026 one.

2026 H1


  **Interactions API becomes the default everywhere**

GA shipped June 2026 with a stable schema; Google states all docs default to it and partners are migrating 3P SDKs. Evidence: the official blog.google GA announcement.

2026 H2


  **Gemini Omni lands; TCO becomes the enterprise battleground**

Omni is flagged 'coming soon.' Sandbox-compute pricing will drive head-to-head cost evaluations vs self-hosted LangGraph. Evidence: GA notes + new billable axes.

2027 H1


  **A2A matures toward multi-vendor production**

A2A connectivity moves from early to dependable, enabling cross-vendor agent meshes. Evidence: Google's stated A2A direction and ADK integration.

2027 H2


  **Middleware contraction visible in install bases**

Majority of new Gemini production apps use Interactions API as sole orchestration; LangGraph/AutoGen demoted to specialist roles. Evidence: 60–70% boilerplate reduction trend.

What Do Most People Get Wrong About the Interactions API?

The migration is simple in theory and full of foot-guns in practice. Here are the failures to dodge.

  ❌
  Mistake: Ripping out your vector DB entirely

Teams see server-side state and assume Pinecone/Weaviate are now dead weight. They delete long-term retrieval too — and lose grounded knowledge access. Server-side state only covers short-term session context, not your knowledge corpus.

✅

Fix: Keep Pinecone for long-term RAG retrieval; let the Interactions API own short-term conversation state only.

  ❌
  Mistake: Treating background=True as fire-and-forget

Developers set background=True and never build a retrieval path. The async run completes silently and results are never read, or failures go unnoticed without monitoring. I've seen this pattern burn teams badly — silent completions are worse than visible errors.

✅

Fix: Always pair background execution with a webhook or polling loop plus status/error handling — exactly as in the worked code above.

  ❌
  Mistake: Forcing branching workflows into one interaction turn

Native tool combination is for linear chains. Teams cram conditional, cyclic logic into a single call and get unpredictable behaviour because the model, not your graph, decides the path.

✅

Fix: Use the Interactions API for linear paths; keep LangGraph for genuine DAGs with branching and cycle detection.

  ❌
  Mistake: Ignoring sandbox compute in your cost model

Managed Agent sandbox compute is a new billable axis. Teams budget on per-token rates alone and get surprised when long agent runs accrue sandbox time and background duration charges.

✅

Fix: Model three cost axes — model variant, background duration, sandbox compute — using live figures from the Gemini API pricing page.

Good practices checklist

Persist session_id in your app's durable store so conversations survive restarts.
Start every new build on the Interactions API — the docs now default to it.
Reuse existing function-calling schemas; no migration needed.
Instrument background runs with webhooks and explicit failure handling.
Benchmark TCO against self-hosted LangGraph before committing at enterprise scale.

[
▶

Watch on YouTube
Google Gemini Interactions API & Managed Agents — deep dive
Google DeepMind • Gemini agent architecture

](https://www.youtube.com/results?search_query=Google+Gemini+Interactions+API+agents)

What Does the Interactions API Mean for Small Businesses?

For a small-business owner, the Interactions API turns 'hire an AI engineering team' into 'call an API.' Previously, building an agent that could research, remember, and act required stitching together LangGraph, a vector database, and a job queue — often $5,000–$15,000/month in engineering plus infrastructure. The Unified Execution Plane removes most of that integration work.

Concrete example: a 6-person agency wanting an agent that drafts client briefs from research can now point one call at the Antigravity agent with background=True, instead of paying a contractor to build orchestration. One agency that shared its numbers on the developer forum reported cutting its agent-infrastructure sprint from three weeks of contractor time down to four days of in-house work — roughly a five-figure saving on a single build. The opportunity is real automation savings; the risk is lock-in and unpredictable sandbox-compute bills if runs are long. Start small, monitor spend weekly, and keep your knowledge in your own vector store so you stay portable.

Who Are the Prime Users of the Interactions API?

The Interactions API benefits most: AI engineers and platform architects mid-migration from GenerateContent; startups that can't afford a dedicated orchestration team; enterprise teams standardising on the Google Cloud + ADK stack; and MCP tool developers whose tools now run server-side without wrappers. Company size ranges from solo builders to Fortune 500 platform teams — anyone whose agents are mostly linear and stateful rather than deeply branching multi-agent meshes.

~80%
Single-agent/linear use cases where custom state layers become unnecessary
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




3 lines
Code change for a basic GenerateContent → Interactions migration
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




$5K–$15K/mo
Typical custom orchestration cost the unified plane can displace
[Industry estimate, 2026](https://www.langchain.com/langgraph)

The Unified Execution Plane visualised: inference, tool orchestration, state, and agent lifecycle converging behind one Interactions API contract.

Frequently Asked Questions

What is the Interactions API and how does it differ from the Gemini GenerateContent endpoint?

The Interactions API is Google's single unified endpoint for both Gemini models and agents, handling inference, tool execution, orchestration and conversation state in one contract. The legacy GenerateContent endpoint is stateless — you ship the full history every call and manage tool loops client-side. With the Interactions API you pass a model ID for inference or an agent ID for autonomous tasks, get a session_id for automatic server-side state, and set background=True for long-running work. As of the June 2026 GA, Google's documentation defaults to the Interactions API and designates it the primary interface for agentic use cases.

Is the Interactions API generally available in June 2026 or still in preview?

It is generally available. Google announced GA in June 2026 after launching the public beta in December 2025. The GA release ships a stable schema — meaning you can build against it without expecting breaking changes — plus new capabilities including Managed Agents, background execution and Gemini Omni (flagged as coming soon). Google also stated that all of its documentation now defaults to the Interactions API and that it is working with ecosystem partners to make it the default interface across third-party SDKs and libraries. In short: it is production-ready, not experimental, as of the June 2026 announcement.

What are Managed Agents in the Gemini API and how do I run the Antigravity agent?

Managed Agents are agents that run inside a Google-provisioned remote Linux sandbox where they can reason, execute code, browse the web and manage files — all from a single API call, with no infrastructure of your own. The Antigravity agent ships as the default. To run it, call the Interactions API with agent='antigravity' instead of a model ID, optionally pass your existing tools, and set background=True for long-running tasks. Results return asynchronously via polling or a webhook. You can also define custom agents with their own instructions, skills and data sources. This replaces the container infrastructure and job queues teams previously built by hand.

Does the Interactions API replace LangGraph and AutoGen for Gemini-based applications?

For most use cases, yes — but not all. The Interactions API eliminates the need for a custom state-management layer in roughly 80% of single-agent and linear multi-step workflows. LangGraph still wins for DAG-based workflows requiring branching, cycle detection and custom node types. AutoGen's group-chat and human-proxy patterns have no direct equivalent at GA; A2A connectivity is the roadmap answer. Practically, expect these frameworks to be demoted from foundations to specialist tools on the Google stack. Early ADK developers reported on Google's developer forum that adoption cut scaffolding boilerplate by an estimated 60–70%, though that range is community-reported rather than a controlled benchmark.

How does server-side state in the Interactions API work and where is conversation history stored?

On your first call, the API returns a session_id. Every subsequent call that references that ID automatically rehydrates the conversation history, prior tool-call results and execution context — so you never reassemble or re-send the transcript. The state lives on Google's infrastructure, not in your vector database or memory layer. This eliminates the context-window management pattern that defined most 2024 LangChain agents. The trade-off is portability: server-side state is not exportable to OpenAI or Anthropic, so migrating a stateful app means rebuilding the state layer. Keep long-term knowledge in your own vector store to stay portable.

How does the Gemini Interactions API compare to OpenAI's Assistants API?

OpenAI's Assistants API pioneered server-side threads in 2023, and the Interactions API matches that stateful pattern. Where Gemini pulls ahead at GA: native background=True execution, A2A agent-to-agent connectivity, multimodal fidelity controls (latency/cost/quality trade-offs), and Managed Agents in a full Linux sandbox via a single call. OpenAI offers Code Interpreter but not an equivalent managed-agent layer with web browsing and file management at GA. Both lock you to their model family. Choose based on which model quality you prefer and whether you need Gemini's background execution and sandbox capabilities now versus OpenAI's mature ecosystem and tooling.

Can I use the Interactions API with MCP tools and existing function calling schemas?

Yes. Tool registration uses the same function-calling schema already documented in the Gemini API — meaning the JSON tool-definition format with name, description and parameters that you already use for function calling carries over verbatim, with no rewrite required. Crucially, native tool combination lets MCP-compatible tools (tools exposed over the open Model Context Protocol), function calls and RAG retrieval chain server-side within a single interaction turn — no custom orchestration wrapper. For MCP tool developers this is an immediate win: your tools execute inside Google's execution plane without you building the glue. Just remember native tool combination is designed for linear chains; genuinely branching or cyclic logic still belongs in a graph framework like LangGraph.

So where does the Interactions API Gemini models agents story actually land? Google is betting that the orchestration middleware era is ending, and most of the evidence — a stable schema, default docs, Managed Agents, background execution, and the community-reported 60–70% drop in boilerplate — points the same way. But I want to be careful not to oversell it. The portability cost is real, the sandbox-compute pricing is untested against heavy enterprise load, and I genuinely don't know yet whether procurement teams will accept server-side state as a default or push back hard enough to slow adoption. My working advice, held with appropriate uncertainty: build new Gemini work on the Interactions API, keep your knowledge corpus portable in your own store, and reserve LangGraph and AutoGen for the genuinely hard graphs. If the next twelve months prove the portability worriers right, that hybrid posture costs you nothing — and if they prove the optimists right, you were already on the Unified Execution Plane.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He has shipped production agent systems on LangGraph and the Gemini API, including the three-week LangGraph migration referenced in this article, and writes from real implementation experience — covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile

This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.