aarhamforensics

Posted on Jun 27 • Originally published at twarx.com

Interactions API Gemini Models Agents: The 2026 GA Breakdown

#ai #machinelearning #automation #productivity

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 27, 2026

Google just made LangGraph, AutoGen, and most of your custom agent scaffolding architecturally unnecessary — and almost nobody in the developer community has fully processed what that means yet. The new Interactions API Gemini models agents endpoint is the reason: one interface, server-side state, background execution, and Managed Agents shipped as managed primitives.

The Interactions API reached general availability in June 2026 as Google's primary interface for Gemini models and agents — one endpoint with server-side state, background execution, tool combination, and Managed Agents. It matters now because the legacy Generate Content API is no longer the strategic default.

By the end of this article you'll know exactly what changed, how to migrate, what it costs, and whether your orchestration framework still earns its keep.

The Interactions API reaches general availability as Google's primary interface for Gemini models and agents. Source

Coined Framework

The Middleware Collapse Event — the moment a foundation model provider absorbs the orchestration, state, and tool-routing layer that third-party frameworks previously owned, rendering external agent scaffolding redundant for the majority of production use cases

It names the structural shift where the value that lived in LangGraph, AutoGen, and CrewAI — session state, tool routing, and execution loops — migrates inside the model provider's own API. When that happens, the framework stops being infrastructure and becomes optional glue.

The Interactions API is not a new endpoint. It is the moment Google stopped letting third-party frameworks own the orchestration layer — and started shipping it as a managed primitive.

Breaking: What Google Announced and When

Official announcement date, source, and GA status

On its official blog, Google announced that the Interactions API has reached general availability and is now 'our primary API for interacting with Gemini models and agents.' The post — authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind — is titled 'Interactions API: our primary interface for Gemini models and agents.'

The public beta launched in December 2025 and 'quickly became developers' favorite way to build applications with Gemini.' The GA release ships a stable schema — and that's the single most important detail for anyone running this in production, because schema instability during preview was the main reason teams held off on migrating. I talked to several teams still sitting on Generate Content for exactly that reason. Now there's no excuse.

Key facts from blog.google and the developer changelog

The announcement describes a single unified endpoint for Gemini models and agents with four headline capabilities: server-side state, background execution, tool combination, and multimodal generation. Google states that all of its documentation now defaults to the Interactions API and that it is 'working with ecosystem partners to make it the default interface across 3P SDKs and Libraries.' For a wider view of where this fits, see our coverage of the state of AI agents.

What changed from the preview to the Generally Available release

Since December 2025, Google added 'major new capabilities that developers asked for, including Managed Agents, background execution, Gemini Omni (soon) and more.' The mechanics, straight from the source: pass a model ID for inference, an agent ID for autonomous tasks, set background=True for anything long-running. That's it. The complexity that used to justify an entire orchestration framework fits in three parameters.

Dec 2025
Interactions API public beta launch
[Google Blog, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1
Unified endpoint replacing fragmented Gemini APIs
[Google Blog, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




3
Lines of code to switch from OpenAI-compatible Gemini calls
[Google AI for Developers, 2026](https://ai.google.dev/gemini-api/docs)

What Is the Interactions API and How Does It Work

Core architecture: a single unified endpoint replacing Generate Content

The Interactions API collapses the previously fragmented Generate Content, Chat, and streaming endpoints into one interface. The mental model is deliberately simple: one call type, two routing decisions. Pass a model ID when you want raw inference. Pass an agent ID when you want an autonomous task. Everything that used to live in your orchestration code — the state machine, the loop, the tool dispatcher — now lives behind that single endpoint. You're not managing less complexity. You're just not the one managing it anymore.

Server-side state and multi-turn session management explained

This is the headline architectural shift, and it's worth sitting with for a moment. Under the old Generate Content API, every turn was stateless — you re-sent the full conversation history, tool-call results, and context window on each request. That pattern is expensive. It's slow at scale. And it's the reason so many teams reached for LangGraph in the first place: they needed something to hold the state so they didn't have to. The Interactions API keeps conversation memory, turn history, and tool results on Google's servers, so each turn just references the session rather than re-transmitting everything that came before it.

For high-turn-count applications, moving state server-side eliminates redundant context re-transmission on every call — the single biggest hidden token cost in stateless multi-turn apps built on Generate Content.

The role of background execution in long-running agent tasks

Set background=True and the server runs the interaction asynchronously. No open connection held by the client while an agent reasons, browses, or executes code for several minutes. This is a direct architectural competitor to LangGraph's persistence layer and AutoGen's runtime loop — which, if we're honest, were the two things those frameworks existed to provide. Everything else was secondary.

How the Middleware Collapse Event reframes third-party orchestration

Here's the part the developer community hasn't fully metabolized: server-side state plus background execution plus native tool routing is the exact feature set that justified external orchestration frameworks. When the model provider ships those as managed primitives, the framework stops being load-bearing infrastructure and becomes optional glue. Sometimes you still want optional glue. But you need a real reason for it now.

Coined Framework

The Middleware Collapse Event in practice

When a team rips out LangGraph persistence and AutoGen loops because background=True and server-side sessions now do the job natively, they have experienced a Middleware Collapse Event. The orchestration layer didn't get better — it got absorbed.

Stateless Generate Content vs Stateful Interactions API — the request flow that changed everything

  1


    **Old: Generate Content (stateless)**

Client assembles full history + tool results + context window, re-sends on every turn. Token cost grows linearly with conversation length.

↓


  2


    **Old: External orchestration (LangGraph / AutoGen)**

Framework holds state, manages the agent loop, routes tool calls. You operate and observe this layer yourself.

↓


  3


    **New: Interactions API (stateful)**

Client sends only the new turn referencing a session ID. State, turn history, and tool results live server-side at Google.

↓


  4


    **New: background=True absorbs the runtime loop**

Long-running agent tasks execute server-side asynchronously. No open client connection, no external runtime to maintain.

The sequence shows how two responsibilities — state and the execution loop — moved from your codebase into Google's API surface.

One more thing worth calling out: the API natively accepts multimodal inputs — text, images, audio, video, documents — within a single request schema. You don't bolt on separate handling for each modality. It's one schema, all modalities, same call.

Visualizing the Middleware Collapse Event: orchestration responsibilities shifting from external frameworks into the Interactions API surface.

Full Capability Breakdown: Every Feature in the Interactions API

Managed Agents: what they are and how they differ from custom agents

Per Google's announcement, Managed Agents mean 'a single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files.' The Antigravity agent ships as the default. You can also 'define your own custom agents with instructions, skills and data sources.' I want to be direct about what this actually removes: container orchestration, sandbox security engineering, file-system plumbing — that whole layer of infrastructure that nobody enjoys building but everybody needs. It's gone. You request it instead.

A single API call now provisions a remote Linux sandbox that reasons, runs code, browses the web, and manages files. The agent runtime is no longer something you build — it's something you request.

Built-in and custom tool combination including Function Calling and MCP

The GA release lets you mix built-in managed tools — Google Search grounding, code execution, RAG retrieval — with custom Function Calling definitions and Model Context Protocol (MCP) tool servers in a single request. Function Calling supports parallel tool calls, required tool enforcement, and compositional chaining. Those capabilities previously pushed teams toward CrewAI or n8n — not because the teams wanted another framework, but because there was no native alternative. Our breakdown of function calling patterns goes deeper on the tradeoffs.

Native MCP tool-server support inside the Interactions API does more than add convenience — it accelerates Model Context Protocol toward de facto standard status, because the most-used Gemini interface now treats MCP as a first-class citizen.

Multimodal input support and grounding with Google Search

Grounding with Google Search is a built-in tool you compose with your own functions. Combined with native multimodal inputs, you can send a screenshot plus a question plus a custom tool definition in one schema and let the agent decide whether to search, call your tool, or both. The decision logic isn't yours to write. That's the point.

RAG integration, vector database connectivity, and memory types

RAG retrieval is exposed as a managed built-in tool, integrating directly with Google's managed vector search infrastructure. That's a meaningful contrast with the pattern where you wire an external Pinecone or Weaviate index through custom retrieval functions. For teams building production RAG systems, this collapses a multi-component pipeline into a managed tool call. Whether that's always the right trade depends on how much you need to own the retrieval logic — but for most teams, you don't need to own it as much as you think.

Streaming, synchronous, and background execution modes

Three execution modes: synchronous (immediate response), streaming (token-level output), and background (async job via webhook or polling). Background mode is what makes agents performing tasks over minutes or hours actually viable without brittle open connections. I'd use nothing else for long-horizon work.

Execution ModeTriggerBest ForClient Connection

SynchronousDefault callShort single-shot inferenceOpen until response

StreamingStream flagChat UIs, token-level UXOpen, incremental

Backgroundbackground=TrueLong-horizon agent tasksNone — async job

How to Access and Use the Interactions API: Step-by-Step

Prerequisites: API key, SDK version, and model availability

The Interactions API is accessible through the Google AI for Developers platform using the same API key infrastructure as previous Gemini endpoints. Update the Python or TypeScript SDK to the latest version targeting the /interactions endpoint path. Gemini 3 Pro is the flagship model available at GA.

Quickstart: initialising a stateful session with Gemini 3 Pro

Python — stateful session

Initialise a stateful Interactions API session with Gemini 3 Pro

from google import genai

client = genai.Client(api_key='YOUR_API_KEY')

Server-side state: no need to resend history each turn

session = client.interactions.create(
model='gemini-3-pro', # pass a model ID for inference
input='Summarise our Q2 churn drivers.'
)

Follow-up references the session — history lives server-side

follow_up = client.interactions.create(
session=session.id,
input='Now draft a retention email for the top driver.'
)
print(follow_up.output_text)

Adding tools: combining Function Calling with managed built-ins

Python — mixing built-in + custom tools

Combine Google Search grounding with a custom function and an MCP server

response = client.interactions.create(
model='gemini-3-pro',
input='Find current churn benchmarks and log them to our CRM.',
tools=[
{'type': 'google_search'}, # managed built-in
{'type': 'function', 'function': log_to_crm_schema}, # custom Function Calling
{'type': 'mcp', 'server': 'https://crm.internal/mcp'} # MCP tool server
]
)

Launching a Managed Agent in a cloud sandbox

Python — Managed Agent + background execution

Provision a remote Linux sandbox agent and run it asynchronously

job = client.interactions.create(
agent='antigravity', # pass an agent ID for autonomous tasks
input='Research three competitors and produce a comparison doc.',
background=True # server runs it async; poll or use a webhook
)
print('Job started:', job.id) # no open connection held by the client

Need pre-built agent patterns rather than rolling your own? You can explore our AI agent library for reference architectures that map cleanly onto Managed Agents and custom agent definitions.

Pricing, rate limits, and quota tiers as of June 2026

Pricing follows Google AI Studio and Vertex AI token-based tiers. Background execution is billed on compute-time in addition to token consumption. Managed Agents incur additional sandbox compute costs — budget for that before you run anything long-horizon in production. Crucially, custom agent definitions reuse standard Function Calling pricing, and there is no premium for the stateful session layer itself. OpenAI library compatibility is maintained, so teams already calling Gemini via OpenAI-compatible endpoints can switch by updating three lines of code.

Migration friction is intentionally low: OpenAI SDK compatibility means a three-line switch to the Interactions API for many teams.

[
▶

Watch on YouTube
Building stateful agents with the Gemini Interactions API and Managed Agents
Google DeepMind • Gemini architecture

](https://www.youtube.com/results?search_query=Google+Gemini+Interactions+API+managed+agents)

When to Use the Interactions API vs Alternatives

Interactions API vs Generate Content API: migration decision tree

The Interactions API supersedes the Generate Content API for all stateful, multi-turn, and agentic workloads. Single-turn, stateless generation can still use the legacy endpoint during the transition window — but Google explicitly frames the Interactions API as 'our primary interface,' and I'd treat that language seriously. If your app holds conversation state or runs tools across turns, migrate. Don't wait for the deprecation notice.

When you still need LangGraph, AutoGen, or CrewAI

LangGraph and AutoGen retain genuine value for: complex graph-based workflows with human-in-the-loop approval nodes; cross-model orchestration spanning non-Gemini providers like Anthropic Claude or OpenAI GPT models; and on-premises deployments where server-side state on Google infrastructure isn't acceptable. Those are real reasons. 'We already have it' is not.

Interactions API vs Vertex AI Agent Builder

Vertex AI Agent Builder targets no-code and low-code enterprise deployments. The Interactions API is the programmatic substrate underneath it. If you need full control, build directly on the Interactions API — don't go through the higher-level abstraction and then fight it when you hit its limits. Teams comparing build paths often start with our agent templates and reference builds before committing.

Choosing between n8n, MCP servers, and native tool calls

n8n remains relevant for connecting Gemini agents to non-API business systems — CRMs, ERPs, the usual suspects — via workflow automation. Its role as an orchestration layer, though, shrinks considerably when the Interactions API chains multi-step tool calls natively. Use n8n where the integration problem is the hard part. Don't use it to compensate for orchestration gaps that no longer exist.

  ❌
  Mistake: Re-sending full history on every Interactions API turn

Teams migrating from Generate Content reflexively keep passing the whole conversation, defeating the server-side state model and paying for tokens twice.

✅

Fix: Reference the session ID and send only the new turn. Let the server hold turn history and tool results.

  ❌
  Mistake: Holding an open connection for long-running agents

Running a multi-minute Managed Agent task synchronously leads to client timeouts and fragile retries.

✅

Fix: Set background=True and poll the job ID or register a webhook.

  ❌
  Mistake: Keeping LangGraph for single-provider Gemini workflows

Maintaining a full orchestration framework for what is now a native capability adds operational overhead and observability burden for no architectural gain.

✅

Fix: Reserve LangGraph for genuine cross-provider or human-in-the-loop graphs; use native tool chaining for single-provider Gemini agents.

  ❌
  Mistake: Ignoring background-execution observability gaps

Background agents run server-side with less tracing detail than LangSmith gives over LangGraph, so silent failures go unnoticed.

✅

Fix: Instrument webhooks with structured logging and persist job IDs and outputs to your own store for audit.

Interactions API vs Closest Competitors: Direct Comparison

Google Interactions API vs OpenAI Responses API and Assistants API

OpenAI's Responses API and Assistants API offer comparable server-side thread management and tool use. But they lack native background execution at the session level, and there's no equivalent to Google's Managed Agents sandbox with pre-built personas like Antigravity. OpenAI is closer than it was a year ago. It's not caught up.

Google Interactions API vs Anthropic Claude tool use and memory

Anthropic's Claude supports sophisticated tool use and extended thinking — genuinely impressive on both counts. But as of June 2026, there's no unified stateful session API at GA. Teams building multi-turn Claude agents manage state client-side or via LangGraph, which is exactly the overhead the Interactions API eliminates on the Gemini side. That's a real gap, and it matters for production systems.

Native agent APIs vs third-party orchestration: the build-vs-buy shift

The competitive gap Google is exploiting here is specific: both OpenAI and Anthropic still require developers to own the orchestration layer for production multi-agent systems. Managed Agents and server-side state in the Interactions API are a first-mover advantage in API-native agent hosting. Whether it holds depends on how fast the others move. Watch this space — but don't wait on it.

CapabilityGoogle Interactions APIOpenAI Responses/AssistantsAnthropic Claude

Unified stateful session at GAYes (June 2026)Partial (threads)No unified API at GA

Session-level background executionYes (background=True)Not at session levelClient-side / framework

Managed Agent sandbox + personasYes (Antigravity default)No equivalentNo equivalent

Native MCP tool serversYesVariesYes (tool use)

Managed vector search for RAGBuilt-inExternal (Pinecone/Weaviate)External

OpenAI SDK compatibilityYes (3-line switch)NativeNo

For RAG specifically, the Interactions API uses Google's managed vector search directly, while OpenAI's equivalent requires external vector databases such as Pinecone or Weaviate wired through custom retrieval functions. That's more moving parts. More things to break at 2am.

Industry Impact: What the Middleware Collapse Event Means for AI Development

How the Interactions API threatens the agent framework ecosystem

The orchestration framework market — LangChain (LangGraph), Microsoft AutoGen, CrewAI, and others — faces structural disruption as hyperscaler APIs absorb the stateful session and tool-routing capabilities that justified them. This isn't gradual erosion. It's the Middleware Collapse Event in macro form, and the Interactions API GA is the trigger.

Coined Framework

The Middleware Collapse Event at market scale

When the absorption of orchestration becomes a provider's default API rather than an experimental feature, the framework market doesn't shrink gradually — it bifurcates into 'cross-provider only' survivors and 'single-provider redundant' casualties. The Interactions API is the trigger event for Gemini.

Impact on enterprise AI budgets and build-vs-vendor decisions

For enterprises running Gemini at scale, internalising state management server-side cuts per-interaction infrastructure cost and removes the engineering overhead of maintaining orchestration layers — conservatively worth tens of thousands of dollars annually for high-volume deployments. A team paying two senior engineers to babysit a LangGraph runtime can reallocate that toward enterprise AI product work. I've seen that reallocation happen. It's not a small thing.

What this means for MCP adoption and standardisation

Native MCP support in the most-used Gemini interface accelerates Model Context Protocol toward de facto standard status. Potentially marginalises proprietary tool-definition formats from competing providers. This is how standards actually get made — not by committee, but by the most-trafficked API treating something as first-class.

Implications for AI engineer roles and required skill sets

Engineers who specialised in LangGraph graph design or AutoGen agent configuration will re-skill toward direct API integration, prompt engineering for managed tool selection, and Interactions API session lifecycle management. The job is shifting from 'build the runtime' to 'design the interaction.' That's a meaningful change in what expertise looks like on a resume — and in what gets valued in an interview. We track this shift in our AI engineering skills guide.

The most valuable AI engineering skill of 2027 won't be building agent runtimes — it'll be knowing exactly when the provider's native runtime is enough and when it isn't.

Expert and Community Reactions to the Interactions API Launch

Developer community response on X, Reddit, and Hacker News

Community coverage — including #TheGenAIGirl on Medium — framed the Interactions API as a meaningful architectural shift toward stateful, multi-turn interaction design, highlighting ADK (Agent Development Kit) integration as a key companion capability. The tone wasn't hype. It was closer to 'we've been waiting for this and now we need to figure out what it breaks.'

Analysis from AI practitioners and framework maintainers

Developer feedback points consistently to the stable schema at GA as the primary blocker that's now gone. Schema instability during preview caused breaking changes that forced teams to delay migration — I heard this from multiple teams who were genuinely interested in the Interactions API but couldn't justify building on a shifting contract. With GA, that's resolved. Framework maintainers in the LangChain ecosystem have acknowledged the competitive pressure, with some repositioning LangGraph as a cross-cloud orchestration layer for teams running models across Google, OpenAI, and Anthropic simultaneously. That's a defensible niche. It's a smaller one than before.

Criticism: what developers say is still missing

Three gaps come up consistently: limited observability and tracing for background agent executions compared to LangSmith for LangGraph; state that locks to Google infrastructure, which makes cross-provider portability harder; and Managed Agent customisation that's still early-stage relative to fully custom agent graphs. These are real gaps. Don't ship background agents into production without your own structured logging in place — the native observability won't save you when something fails silently.

The honest trade: the Interactions API removes orchestration overhead but adds provider lock-in. Your state now lives on Google's servers — a feature for velocity, a risk for portability.

Framework maintainers are repositioning toward cross-provider orchestration as the Interactions API absorbs single-provider use cases.

What Comes Next: Roadmap, Predictions, and Strategic Recommendations

Google's signalled roadmap for the Interactions API post-GA

Google's announcement explicitly flags Gemini Omni (soon) and continued expansion of the Managed Agents catalogue beyond the initial Antigravity agent, with ADK positioned as the companion framework for programmatic agent definition. The roadmap signals are clear enough that you can plan against them.

Predicted expansion: more Managed Agents and deeper ADK integration

Expect a growing roster of pre-built agent personas and tighter ADK-to-Interactions binding. The bar for teams who want managed runtimes without writing their own agent loops drops further with each new addition to that catalogue.

Strategic recommendations for development teams in H2 2026

Teams in production on the Generate Content API should begin migration planning now — not because it's broken, but because Google's framing of the Interactions API as 'our primary interface' strongly implies the legacy endpoint enters deprecation within 12–18 months, consistent with Google's historical API lifecycle patterns. Starting migration planning after the deprecation notice is too late to do it cleanly. Our API migration playbook walks through staging this safely.

Bold prediction: the 18-month horizon for native agent APIs

By end of 2027, the majority of Gemini-based production AI agents will run entirely on the Interactions API with zero external orchestration frameworks. The Middleware Collapse Event will be complete. Framework adoption consolidates around cross-provider use cases only — and that's a smaller market than what these frameworks were built for.

2026 H1


  **Interactions API GA with stable schema**

Confirmed: Google announces GA with Managed Agents, background execution, and a frozen schema that unblocks production adoption.

2026 H2


  **Gemini Omni ships; Managed Agent catalogue expands**

Grounded in the announcement's 'Gemini Omni (soon)' and 'define your own custom agents' framing, plus deeper ADK integration.

2027 H1


  **Generate Content API enters deprecation signalling**

Based on Google's historical 12–18 month lifecycle once an API is declared the 'primary interface.'

2027 H2


  **Framework consolidation around cross-provider only**

LangGraph and AutoGen survive as multi-cloud orchestration layers; single-provider Gemini agents run framework-free.

Frequently Asked Questions

What is the Interactions API Gemini models agents endpoint and how does it differ from Generate Content?

The Interactions API is Google's single unified endpoint for Gemini models and agents, now the primary interface as of its June 2026 general availability. Unlike the stateless Generate Content API — where you re-send full conversation history on every turn — the Interactions API keeps state server-side. You pass a model ID for inference, an agent ID for autonomous tasks, and set background=True for long-running work. It also adds Managed Agents, native tool combination (Function Calling plus MCP), and multimodal inputs in one schema. The practical difference: less code, lower token cost on multi-turn apps, and no need for an external runtime loop.

Is the Interactions API generally available and which Gemini models does it support?

Yes. Google announced general availability in June 2026, after a public beta that launched in December 2025. The GA release ships a stable schema — the key signal that production teams can build without fear of breaking changes. Gemini 3 Pro is the flagship model available through the API at GA, and Google notes Gemini Omni is coming soon. All Google documentation now defaults to the Interactions API, and Google is working with ecosystem partners to make it the default across third-party SDKs and libraries.

How do Managed Agents work in the Interactions API and what is the Antigravity agent?

Per Google's announcement, a single API call provisions a remote Linux sandbox where a Managed Agent can reason, execute code, browse the web, and manage files. The Antigravity agent ships as the default Managed Agent, and you can define custom agents with your own instructions, skills, and data sources. You invoke an agent by passing its agent ID instead of a model ID. Pair it with background=True for tasks that run over minutes or hours without holding an open client connection. This removes the infrastructure overhead — container orchestration, sandbox security, file handling — that previously pushed teams toward frameworks like CrewAI or custom runtimes.

Can I use the Interactions API with OpenAI SDKs or do I need to rewrite my code?

You do not need a rewrite. OpenAI library compatibility is maintained, so teams already calling Gemini through OpenAI-compatible endpoints can switch to the Interactions API by updating roughly three lines of code — base URL, model identifier, and the call pattern. This deliberately low migration friction is part of Google's strategy to make the Interactions API the default. For teams using the native Google AI for Developers Python or TypeScript SDKs, the change is updating to the latest SDK version that targets the /interactions endpoint path. The same API key infrastructure as previous Gemini endpoints applies.

How does server-side state management in the Interactions API reduce token costs?

In the stateless Generate Content pattern, each turn re-sends the entire conversation history, prior tool-call results, and context window — so input tokens grow with every exchange. The Interactions API holds memory, turn history, and tool results server-side, so a follow-up references the session ID and transmits only the new turn. For long multi-turn applications, this eliminates the largest hidden cost in stateless designs: redundant context re-transmission. Beyond tokens, it reduces latency because the model isn't reprocessing the full history each time. There is also no premium for the stateful session layer itself — custom agent definitions reuse standard Function Calling pricing.

Should I migrate from LangGraph or AutoGen to the Interactions API for my agent workflows?

If your agents run on Gemini only and your needs are state, tool chaining, and long-running execution, migrating to the Interactions API removes operational overhead — that's the Middleware Collapse Event in action. Keep LangGraph or AutoGen when you need complex graph-based workflows with human-in-the-loop approval nodes, cross-provider orchestration spanning Anthropic Claude or OpenAI, on-premises deployment where Google-hosted state is unacceptable, or rich tracing via LangSmith. The pragmatic move: migrate single-provider Gemini agents now, reserve frameworks for genuinely cross-cloud architectures.

What is the pricing model for the Interactions API including background execution and Managed Agents?

Pricing follows existing Google AI Studio and Vertex AI token-based tiers. Background execution adds compute-time billing on top of token consumption, since the server runs the interaction asynchronously. Managed Agents incur additional sandbox compute costs because each call provisions a remote Linux environment. Importantly, custom agent definitions reuse standard Function Calling pricing, and there is no premium for the stateful session layer itself — you pay for tokens and any compute you actually consume, not for state management as a separate line item. Check the live pricing pages for current per-token rates by tier and model.

Bottom line: the Interactions API isn't a convenience release — it's a strategic repositioning of where orchestration value lives. Migrate your single-provider Gemini agents, keep your frameworks for cross-cloud, and instrument your background jobs before you trust them in production.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He writes from real implementation experience — covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile

This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.