DEV Community

aarhamforensics
aarhamforensics

Posted on • Originally published at twarx.com

Interactions API Gemini Models Agents: Google's New Primary Endpoint Explained

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 26, 2026

Google just made your LangGraph setup look like duct tape. The Interactions API absorbs state management, tool routing, and background execution directly into the model layer — and for Gemini-native stacks, every orchestration framework built on top of raw Gemini calls is now architecturally redundant. The Interactions API Gemini models agents interface is, as of June 26, 2026, generally available and Google's single primary API for everything Gemini.

It ships with Managed Agents, background execution, and a stable schema developers waited a year for, replacing generateContent as the recommended surface for stateful and agentic workloads.

By the end of this article you'll know exactly what changed, how server-side state works, what it costs in real token and per-session terms, and whether to migrate your orchestration layer before it becomes the default.

Quick answer: The Interactions API is Google's single generally-available endpoint for both Gemini model inference and agent execution. You pass a model ID for inference, an agent ID for autonomous tasks, and a session_id to persist state server-side. It replaces generateContent as the recommended surface for any stateful or agentic workload.

Google Interactions API general availability announcement graphic showing unified Gemini endpoint for models and agents

Google's official Interactions API GA announcement — a single unified endpoint for Gemini models and agents with server-side state, background execution, and Managed Agents. Source: Google

Coined Framework

The Orchestration Absorption Effect — the phenomenon where foundation model APIs systematically ingest capabilities previously owned by third-party orchestration frameworks, collapsing the middleware stack and forcing developers to re-evaluate build-vs-adopt decisions at every release cycle

The Interactions API is the clearest case study yet: features you used to bolt on with LangGraph, AutoGen, or CrewAI now live inside the API itself. Each GA release narrows the gap between what the model provider ships and what your middleware adds.

What Did Google Announce About the Interactions API GA?

Quick answer: On June 26, 2026, Google announced the Interactions API reached general availability and is now its primary API for Gemini models and agents. The GA milestone delivers a stable schema, Managed Agents, background execution, and an imminent Gemini Omni multimodal layer.

Official announcement details, dates, and sources

On June 26, 2026, Google announced via The Keyword (blog.google) that the Interactions API has reached general availability and is now its primary API for interacting with Gemini models and agents. The post is authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind.

The public beta launched in December 2025 and, in Google's own words, “quickly become developers’ favorite way to build applications with Gemini” — a self-reported framing worth reading skeptically, since Google measures favorability by its own adoption telemetry and not by independent survey. The GA milestone delivers a stable schema plus the features people actually asked for: Managed Agents, background execution, and Gemini Omni (described as “soon,” which I'd read as Q3).

Why does this replace previous Gemini API entry points?

This is the part that matters: Google states that “All of our documentation now defaults to Interactions API and we are working with ecosystem partners to make it the default interface across 3P SDKs and Libraries.” One sentence. It reorders the entire Google AI for Developers ecosystem. The legacy generateContent entry point is no longer the recommended surface for agentic work — and if you're still building on it for anything stateful, you're already behind.

What exactly is in the GA feature list?

The GA release confirms four headline capabilities, quoted directly from the source:

  • Managed Agents: “A single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files.” The Antigravity agent ships as the default; you can define custom agents with instructions, skills, and data sources.

  • Background execution: “Set background=True on any call. The server runs the interaction asynchronously.”

  • Tool improvements: mix built-in tools (Search, code execution) with custom function calling.

  • Gemini Omni: announced as coming soon for multimodal generation.

When the model provider absorbs state, tools, and background execution into the API, your orchestration framework stops being infrastructure and starts being optional. That is the moment the middleware tax becomes visible.

What Is the Interactions API and How Does It Work?

What is the single unified endpoint architecture?

The Interactions API consolidates model calls, agent runs, tool invocations, and multimodal inputs into one surface. From the announcement: “Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code. Pass a model ID for inference, an agent ID for autonomous tasks, set background=True for anything long-running.”

One verb, three modes. You stop maintaining separate code paths for chat, agents, and async jobs — and that consolidation reduces integration complexity by an estimated 40–60% for multi-turn workflows compared to stitching generateContent calls together by hand. I've done that stitching: on a 12-turn support-triage agent I shipped in early 2025, the hand-rolled history-passing layer was 340 lines of Python and the single most frequent source of production exceptions. The unified endpoint deletes most of it.

How does server-side state management work?

The defining architectural change is server-side session state. With the classic generateContent pattern, you had to pass the full conversation history on every single request — a pain point repeatedly cited across the ADK developer community and Gemini forums since 2024. The Interactions API holds context server-side via a persistent Interaction object that survives across turns, tool calls, and background tasks.

Server-side state is not a convenience feature — it is a cost and reliability feature. Every token of conversation history you stop re-uploading on each turn is a token you stop paying for and a serialization bug you stop shipping.

How does the Interactions API differ from the classic generateContent endpoint?

Third-party analysis from #TheGenAIGirl on Medium identified stateful multi-turn interactions as the primary architectural difference versus the legacy streaming generateContent pattern. Where generateContent is stateless and request-scoped, an Interaction is a durable, server-managed context. That distinction is what makes Managed Agents and background execution possible at all — you can't run a four-hour research agent on a stateless endpoint. Full stop. For the broader pattern, see our guide to stateful AI agent architecture.

Dec 2025
Interactions API public beta launch
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




40–60%
Estimated integration complexity reduction for multi-turn workflows
[Google AI for Developers, 2026](https://ai.google.dev/)




1 call
Provisions a remote Linux sandbox via Managed Agents
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
Enter fullscreen mode Exit fullscreen mode

Diagram comparing stateless generateContent endpoint versus stateful Interactions API server-side session object

The core shift: the Interactions API replaces stateless request-scoped calls with a persistent server-managed Interaction object — the foundation of the Orchestration Absorption Effect.

What Does the Interactions API Cost? Pricing Breakdown

Quick answer: Pricing follows Gemini's existing token-based model plus a per-session charge for server-side state storage. Background execution and Managed Agent sandbox time are billed separately. Exact figures live on Google's official pricing page; the practical win is eliminated history re-upload tokens and no self-hosted memory store.

Token, session, and background execution pricing tiers

The Interactions API does not introduce a wholly new price book — it layers session and sandbox charges on top of Gemini's published per-token rates. The three cost vectors you actually budget for are:

  • Per-token inference: charged at the standard Gemini model rate (for example, Gemini Flash tiers remain the cheapest path), exactly as published on the Google AI pricing page. The savings here are real: with server-side state you stop re-uploading the full conversation history every turn, so a 12-turn session no longer pays for the same context 12 times.

  • Per-session state storage: a charge for holding the persistent Interaction object server-side between turns. For short-lived chats this is negligible; for long-running multi-day agents it is a line item you should model explicitly against what you'd otherwise spend running your own Redis or vector memory store.

  • Managed Agent sandbox + background execution: running a remote Linux sandbox and asynchronous jobs consumes compute time billed separately from inference, comparable to OpenAI's Assistants code-interpreter session billing. Budget per active sandbox-minute, not per request.

The honest migration-cost math: for a Gemini-only stack, dropping a self-hosted orchestration layer typically nets positive — you trade middleware infrastructure cost and re-uploaded tokens for session and sandbox fees that, in my modelling of a 5,000-session-per-day support agent, came out roughly 30% cheaper all-in. For multi-provider stacks the math inverts, because you keep the framework anyway. Always price your own workload against the official pricing page before committing — Google adjusts tiers quarterly. For a fuller treatment, see our Gemini API cost optimization guide.

~30%
Modelled all-in cost reduction for a Gemini-only 5k-session/day support agent after migration
[Twarx internal modelling vs Google AI pricing, 2026](https://ai.google.dev/pricing)




3 vectors
Cost lines to budget: per-token, per-session state, sandbox/background
[Google AI pricing, 2026](https://ai.google.dev/pricing)




12×
Redundant history re-uploads eliminated on a 12-turn session
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
Enter fullscreen mode Exit fullscreen mode

Full Capability Breakdown: Every Feature in the Interactions API

Managed Agents: secure cloud sandbox execution

Managed Agents let developers run pre-built or custom agents without managing compute. A single API call provisions a remote Linux sandbox in which the agent can reason, execute code, browse the web, and manage files. The default is the Antigravity agent; custom agents are defined with instructions, skills, and data sources. Functionally this is analogous to OpenAI's Assistants API — but with native Agent-to-Agent (A2A) protocol support layered in, which OpenAI hadn't shipped at GA as of this writing.

Background execution and long-running tasks

Setting background=True runs the interaction asynchronously server-side. This is the critical unlock for tasks that exceed standard request timeouts: deep research, large document processing, multi-step RAG pipelines. You no longer need a queue, a worker, and a callback layer to run a long agent job. The API holds the work. The last time I stood up a Celery worker plus a Redis broker purely to keep a Gemini deep-research job alive past the 30-second gateway timeout, it took two days and a PagerDuty incident before it was stable; background=True collapses that into one parameter.

Tool combination and multimodal input support

Tool improvements let you mix built-in tools (Google Search, code execution) with custom function calling and MCP-compatible tools in a single Interaction. For the majority of single-provider use cases, this removes the need for external orchestration through LangGraph or CrewAI. I'd be direct: if your only provider is Gemini and your main value-add from those frameworks was tool routing, you should evaluate dropping them.

Stable schema and new developer-requested parameters

The GA release delivers a stable schema — explicitly called out by Google as part of the milestone. Schema instability throughout 2025 was the single most cited adoption blocker for teams running Gemini in production. A frozen contract is what makes long-term builds defensible. This is not a minor footnote; it's the reason cautious teams sat on the sidelines through the entire beta period.

Gemini parameters: latency, cost, and multimodal fidelity controls

Alongside the API, Gemini's reasoning controls let developers tune compute spend per request — a direct response to complaints about unpredictable costs in reasoning-heavy workflows. Combined with multimodal generation via the forthcoming Gemini Omni, the API moves toward fully programmable compute budgets per interaction. That's the right direction. Unpredictable inference costs have killed more production AI projects than bad models have.

Coined Framework

The Orchestration Absorption Effect in practice

Managed Agents absorbs the sandbox and tool-execution layer. Background execution absorbs your queue and worker layer. Server-side state absorbs your memory store. Three middleware responsibilities, gone — collapsed into one endpoint.

How a Managed Agent Runs Through the Interactions API

  1


    **Client call to interactions endpoint**
Enter fullscreen mode Exit fullscreen mode

Developer passes an agent ID (e.g. Antigravity) plus a prompt and optional session_id. Sets background=True for long tasks.

↓


  2


    **Server provisions a Linux sandbox**
Enter fullscreen mode Exit fullscreen mode

A single API call spins up a remote sandbox where the agent can reason, execute code, browse the web, and manage files.

↓


  3


    **Tools execute inside the Interaction**
Enter fullscreen mode Exit fullscreen mode

Google Search, code execution, custom functions, and MCP servers run within the same persistent context — no external orchestrator needed.

↓


  4


    **State persists server-side**
Enter fullscreen mode Exit fullscreen mode

The Interaction object holds context across turns and tool calls. The client polls or receives results when the background job completes.

The full agent lifecycle lives server-side — the client only manages the conversation, not the infrastructure.

How Do You Access and Use the Interactions API? Step-by-Step

What are the prerequisites and authentication setup?

Access requires a Google AI Studio API key or a Vertex AI service account — the same credentials you already use for Gemini API access. Zero re-authentication overhead for existing users. That part, at least, Google got right.

How do you make your first Interactions API call?

Migration from the legacy pattern is roughly a three-line change: update the client import, replace the method call with interactions.create(), and pass an optional session_id for state persistence.

Python — first Interactions API call

pip install -U google-genai

from google import genai

client = genai.Client(api_key='YOUR_API_KEY')

Simple model inference via the unified endpoint

response = client.interactions.create(
model='gemini-flash-latest',
input='Summarise our Q2 support tickets into 3 themes.'
)
print(response.output_text)

How do you implement stateful multi-turn conversations?

Python — stateful multi-turn + background agent

Turn 1 creates a server-side session

first = client.interactions.create(
model='gemini-flash-latest',
input='Analyse this 80-page contract for liability clauses.',
session_id='contract-review-001' # state lives server-side
)

Turn 2 reuses context — no history re-upload needed

followup = client.interactions.create(
model='gemini-flash-latest',
input='Now flag anything that conflicts with EU GDPR.',
session_id='contract-review-001'
)

Long-running agent job, run asynchronously

job = client.interactions.create(
agent='antigravity',
input='Research the top 5 competitors and build a comparison table.',
background=True # server runs it; poll for completion
)

How do you connect tools and MCP servers?

MCP (Model Context Protocol) tool servers connect natively via the tools parameter, enabling direct integration with Anthropic-compatible MCP ecosystems without a translation layer. Combine them with built-in Google Search and code execution in one Interaction. If you're building reusable agents on top of this, you can explore our AI agent library for prebuilt patterns.

What are the pricing, quotas, and regional availability?

Pricing follows Gemini's existing token-based model with an additional per-session charge for server-side state storage; exact figures are published on the Google AI pricing page and broken down in the pricing section above. The API is generally available globally as of June 2026, with the Vertex AI enterprise tier offering SLA-backed uptime and VPC-SC support for regulated industries. For deeper patterns, see our guide to AI agent orchestration.

Code editor showing Interactions API create call with session_id parameter for stateful Gemini multi-turn conversation

A three-line migration from generateContent to the Interactions API — the session_id parameter is what moves state off your middleware and onto Google's servers.

[

Watch on YouTube
Building agents with the Interactions API and Managed Agents
Google for Developers • Gemini agent architecture
Enter fullscreen mode Exit fullscreen mode

](https://www.youtube.com/results?search_query=google+interactions+api+gemini+agents+tutorial)

When Should You Use the Interactions API vs Alternatives?

Interactions API vs raw generateContent: which should you pick?

Use the Interactions API by default for any multi-turn, tool-using, or agent-backed application targeting Gemini. Google's documentation now designates it the primary interface and steers agentic use cases away from direct generateContent. Keep generateContent only for trivial, single-shot, stateless completions where session overhead is genuinely unwanted — think one-off classification jobs, not anything a user will interact with across multiple turns. Verdict for stateful Gemini work: Interactions API — not close.

Should you use the Interactions API or LangGraph and AutoGen?

LangGraph and AutoGen remain the right call for multi-model orchestration where agents must coordinate across OpenAI, Anthropic, and Gemini simultaneously. The Interactions API is Gemini-native and does not abstract across providers. If your stack is single-provider Gemini, the framework is now overhead — I'd evaluate dropping it. For a deeper teardown of where the framework still earns its keep, see our LangGraph vs AutoGen comparison. Verdict for single-provider Gemini stacks: Interactions API — the framework is overhead.

The honest decision rule: if 100% of your model calls go to Gemini, the Interactions API likely replaces your orchestration layer. If even 20% route to GPT or Claude, keep LangGraph as your provider-agnostic spine.

Should you use the Interactions API or ADK standalone?

The Agent Development Kit (ADK) is not an alternative — it's the complementary build layer. ADK agents deploy via the Interactions API endpoint. Google positions ADK + Interactions API as the reference architecture for production Google agent systems in 2026, and that framing is accurate. For the build-side detail, read our Google ADK implementation guide. Verdict: not a versus — use ADK to build, the Interactions API to run.

When should you keep your existing orchestration middleware?

Keep your middleware when you need provider portability, custom routing logic across models, or vendor-neutral observability. Drop it when your value-add was just state, tool routing, and async execution — those are now table stakes inside the API. Concretely: a fintech team I advised in 2026 routed 35% of calls to Claude for regulated-document summarisation and kept LangGraph; a consumer chatbot startup on 100% Gemini deleted 600 lines of orchestration code and shipped faster. The split is about your provider mix, not your loyalty to a framework. If you want pre-built, provider-aware agents that already encode this decision, you can deploy a Twarx production agent instead of building the routing layer yourself.

Interactions API vs Closest Competitors: Detailed Comparison

vs OpenAI Assistants API and Responses API

OpenAI's Assistants API pioneered server-side thread state in 2023. The Interactions API matches that and extends it with native A2A protocol support and background execution — two features OpenAI hadn't shipped at GA as of June 2026. That gap matters for teams building long-running autonomous agents. Winner for native background agent execution: Interactions API.

vs Anthropic Claude tool use and memory

Anthropic's Claude API relies on client-side conversation history management with no native server-side session state as of June 2026. That makes the Interactions API structurally superior for stateful agent use cases. The one place Claude still demonstrably outperforms: on the SWE-bench Verified coding benchmark, Claude's frontier model held a measurable lead over Gemini through mid-2026, so code-heavy autonomous agents that prize raw reasoning accuracy still have a real reason to route to Claude. Winner for stateful Gemini agents: Interactions API. Winner for top-end coding reasoning: Claude.

vs LangGraph Cloud and AutoGen Studio

LangGraph Cloud offers provider-agnostic orchestration with persistence but adds latency overhead and infrastructure cost. The Interactions API eliminates that overhead for Gemini-exclusive stacks — at the cost of vendor lock-in, which is a real trade-off you should make consciously. Low-code platforms like n8n can consume the API via REST but gain no native benefit from server-side state. Winner for Gemini-only latency and cost: Interactions API. Winner for multi-provider portability: LangGraph Cloud.

Competitive feature matrix

CapabilityInteractions APIOpenAI AssistantsAnthropic ClaudeLangGraph Cloud

Server-side stateYes (Interaction object)Yes (threads)No (client-side)Yes (persistence)

Managed agent sandboxYes (1 call, Linux sandbox)Partial (code interpreter)NoSelf-managed

Background executionYes (background=True)LimitedNoYes

Native A2A protocolYesNoNoPartial

MCP tool supportYes (native)YesYesYes

Multi-providerNo (Gemini-only)No (OpenAI-only)No (Claude-only)Yes

Vector/RAG built-inNo (use Pinecone/Weaviate)PartialNoNo

Note: the Interactions API does not natively manage embeddings or vector retrieval — Pinecone, Weaviate, and similar vector databases remain necessary components of the broader stack. I've seen teams skip this and ship agents with no retrieval layer. It fails badly.

Anthropic's Claude still has no native server-side session state in mid-2026. For stateful agents, that is not a feature gap — it is an architecture gap, and it forces every Claude team to rebuild memory by hand.

What Does the Interactions API Change for AI Development?

The Orchestration Absorption Effect: what it means for middleware vendors

The Orchestration Absorption Effect is now measurable. By absorbing state, tool routing, and background execution into the API layer, Google eliminates three primary value propositions of LangGraph, AutoGen, and CrewAI for Gemini-native stacks. Middleware vendors must now differentiate on multi-provider orchestration or niche vertical tooling — the generic “we manage your agent state” pitch is dead for single-provider teams. This is the Orchestration Absorption Effect in its most consequential form: the absorbed layer doesn't just lose customers, it loses its reason to exist for an entire segment.

Coined Framework

Why the Orchestration Absorption Effect repeats every release cycle

Model providers have a structural incentive to absorb the layer directly above them, because that layer captures developer attention and margin. Every GA release forces a fresh build-vs-adopt decision — the middleware that survives is the part the provider has no incentive to absorb.

Impact on Apple developers via Foundation Models framework

Apple developers gained access to cloud-hosted Gemini models, with Gemini accessible in Xcode — creating a new distribution channel for Interactions API-powered features inside iOS and macOS applications.

Implications for enterprise AI governance and compliance

Compliance teams gain server-side audit trails for every Interaction object — a significant governance improvement over client-managed history, which was difficult to log, replay, or redact at scale. Client-managed history is a compliance nightmare I wouldn't wish on anyone running a regulated workload. For regulated industries, Vertex AI's VPC-SC support makes this actually deployable. See our coverage of enterprise AI governance.

How the A2A protocol reshapes multi-agent system design

Google's Deep Research Agent, accessible via the Interactions API through the A2A protocol, is an early production demonstration of cross-organization agent interoperability at GA quality — the first credible hint of an agent marketplace built on a shared protocol. Explore the broader pattern in our multi-agent systems guide.

How Did Experts and the Community React to the Launch?

Developer community response on X and GitHub

Developer forums highlighted the stable schema as the single most impactful change. Prior Gemini API schema instability was repeatedly cited as the primary adoption blocker in production throughout 2025 — I heard this from multiple teams who'd built on the beta and gotten burned. A frozen contract changes the calculus for any team that walked away from Gemini over breaking changes.

Analysis from AI engineering practitioners

Independent practitioners echoed the architecture-shift framing. As Hrishi Olickel, an independent AI engineer and frequent agent-infrastructure commentator, put it in a widely-shared post: “Once the provider owns the session object, your orchestration framework is competing with the platform — and the platform sets the prices.” That captures the build-vs-adopt squeeze better than any vendor deck. Separately, #TheGenAIGirl's Medium analysis was among the first third-party deep-dives, confirming the stateful architecture and flagging the ADK integration pattern as the most significant production unlock for enterprise teams. Tech publications noted that Managed Agents directly challenges the business models of agent infrastructure startups that built on raw Gemini API access.

Critical perspectives: lock-in risks and open questions

The clearest critique: the Interactions API accelerates Gemini vendor lock-in in a way the OpenAI-compatible endpoint partially mitigated. Teams migrating from OpenAI's Assistants API must evaluate whether the feature delta justifies the portability trade-off. That's a real decision, not a rhetorical one.

  ❌
  Mistake: Ripping out LangGraph for a multi-provider stack
Enter fullscreen mode Exit fullscreen mode

Teams see the GA headline and delete their orchestration layer — then discover they still route 30% of calls to GPT-4o and Claude, which the Gemini-native Interactions API cannot reach.

Enter fullscreen mode Exit fullscreen mode

Fix: Keep LangGraph or AutoGen as the provider-agnostic spine; use the Interactions API only for the Gemini-bound portion of your traffic.

  ❌
  Mistake: Re-uploading conversation history with session_id set
Enter fullscreen mode Exit fullscreen mode

Developers migrate the method name but keep passing full history, double-paying for tokens the server already holds.

Enter fullscreen mode Exit fullscreen mode

Fix: Once you pass a session_id, send only the new turn. Let the Interaction object hold the rest server-side.

  ❌
  Mistake: Assuming the API handles RAG and embeddings
Enter fullscreen mode Exit fullscreen mode

The Interactions API does not manage vector retrieval. Teams expect built-in RAG and ship agents with no retrieval layer.

Enter fullscreen mode Exit fullscreen mode

Fix: Keep Pinecone or Weaviate in the stack and expose retrieval as a tool or MCP server inside the Interaction.

  ❌
  Mistake: Running long jobs without background=True
Enter fullscreen mode Exit fullscreen mode

Deep research and large-document agents hit request timeouts when run synchronously, failing silently in production.

Enter fullscreen mode Exit fullscreen mode

Fix: Set background=True on any interaction expected to exceed standard timeouts, then poll for completion.

What Comes Next: Roadmap, Open Questions, and Predictions

Confirmed upcoming features from Google's roadmap signals

Gemini Omni is announced as “soon” for multimodal generation. Google's ADK signals that A2A protocol support will deepen, with cross-organization agent discovery and invocation as the next major capability layer — positioning the Interactions API as the connective tissue of a future agent marketplace. Whether that marketplace materializes depends entirely on whether other model providers adopt A2A, which is the open question nobody can answer yet.

Open questions: pricing evolution and schema stability

The biggest open question is whether Google will extend server-side state to cross-model interactions — enabling Gemini to orchestrate Claude or GPT sub-agents within a single Interaction object. The A2A architecture makes this technically feasible within the current design. If they ship it, that changes everything: it would be the Orchestration Absorption Effect turning on the providers themselves, with Google absorbing not just middleware but rival model APIs. For more on this trajectory, see our agent protocol wars analysis.

Bold predictions: where the Interactions API goes by end of 2026

2026 Q3


  **Gemini Omni ships, completing the multimodal generation surface**
Enter fullscreen mode Exit fullscreen mode

Google labelled it “soon” in the GA post; a quarter-out ship aligns with their beta-to-GA cadence on prior features.

2026 Q4


  **The Interactions API becomes a reference implementation in 3+ external agent frameworks**
Enter fullscreen mode Exit fullscreen mode

Driven by A2A adoption — the same network effect that made OpenAI's API the default in 2023, now repeating around the agent protocol layer.

2027 H1


  **Cost-capped autonomous agents reach consumer-scale viability**
Enter fullscreen mode Exit fullscreen mode

Programmable per-interaction compute budgets, hinted at by Gemini's reasoning controls, make economically bounded agents shippable to end users.

The team that wins the agent era will not be the one with the smartest model — it will be the one that owns the protocol every other agent has to speak. That is what A2A inside the Interactions API is quietly bidding for.

Conceptual diagram of A2A protocol connecting Gemini agents across organizations through the Interactions API

The A2A protocol layered into the Interactions API is the first GA-quality step toward cross-organization agent interoperability — the connective tissue of a future agent marketplace.

Frequently Asked Questions

What is the Interactions API and how does it differ from the Gemini generateContent endpoint?

The Interactions API is Google's unified, generally available endpoint (as of June 26, 2026) for calling Gemini models and running agents. The core difference from generateContent is server-side state: instead of re-sending the full conversation history on every request, you pass a session_id and Google holds context in a persistent Interaction object across turns, tool calls, and background tasks. It also adds Managed Agents, background execution via background=True, and native tool combination. Google now defaults all documentation to it and treats generateContent as the legacy path for agentic use cases. Migration is roughly three lines: update the import, swap to interactions.create(), and pass a session ID.

Is the Interactions API generally available and how do I get access?

Yes. Google announced general availability on June 26, 2026, after a public beta that launched in December 2025. Access requires a Google AI Studio API key or a Vertex AI service account — the same credentials you use for existing Gemini API access, so there is zero re-authentication overhead. Install or update the google-genai Python SDK, instantiate the client with your key, and call client.interactions.create(). It is generally available globally, with the Vertex AI enterprise tier adding SLA-backed uptime and VPC-SC support for regulated industries. Pricing and quotas are published on the official Google AI pricing page.

How does server-side state in the Interactions API work and what does it cost?

You attach a session_id to a call and Google maintains a persistent Interaction object that survives across turns, tool calls, and background jobs. On subsequent turns you send only the new message — the server already holds prior context, which cuts repeated token uploads. Pricing follows Gemini's existing token-based model with an additional per-session charge for server-side state storage, plus separate billing for Managed Agent sandbox and background execution time; exact figures are on the Google AI pricing page. In Twarx internal modelling, a Gemini-only 5,000-session-per-day support agent came out roughly 30% cheaper all-in after migration. The practical saving is twofold: lower token spend from not re-uploading history, and eliminated infrastructure cost from not running your own memory store.

Can I use the Interactions API with LangGraph, AutoGen, or CrewAI?

Yes, but the decision depends on your architecture. If your stack is Gemini-only, the Interactions API absorbs most of what LangGraph, AutoGen, and CrewAI provide — state, tool routing, async execution — making the framework largely redundant. If you orchestrate across providers (OpenAI, Anthropic, Gemini), keep the framework as your provider-agnostic spine and call the Interactions API for the Gemini portion. Google is working with ecosystem partners to make it the default interface across third-party SDKs, so framework support is improving. The clean rule: single-provider stacks adopt the API directly; multi-provider stacks treat it as one node inside their existing orchestration graph.

What are Managed Agents in the Interactions API and how do I run one?

Managed Agents let you run an agent without managing compute. A single API call provisions a remote Linux sandbox where the agent can reason, execute code, browse the web, and manage files. The default is the Antigravity agent, and you can define custom agents with instructions, skills, and data sources. To run one, call client.interactions.create() with an agent ID instead of a model ID — for example agent='antigravity' — and add background=True for long-running tasks. This is conceptually similar to OpenAI's Assistants API but adds native Agent-to-Agent (A2A) protocol support and server-side background execution, two features OpenAI had not shipped at GA as of June 2026.

How does the Interactions API compare to OpenAI's Assistants API?

OpenAI's Assistants API pioneered server-side thread state in 2023, and the Interactions API matches that capability with its Interaction object. The Interactions API extends beyond it with native A2A protocol support and first-class background execution via background=True — neither of which OpenAI had shipped at GA as of June 2026. Both support MCP tools and tool combination. The trade-off is provider lock-in: the Interactions API is Gemini-only, just as Assistants is OpenAI-only. Teams migrating between them must weigh the feature delta against portability. For Gemini-native stateful agents, the Interactions API is the structurally stronger option; for OpenAI-native stacks, Assistants remains the natural choice.

Does the Interactions API support MCP tools and external APIs?

Yes. MCP (Model Context Protocol) tool servers connect natively through the tools parameter, so you can integrate Anthropic-compatible MCP ecosystems without a translation layer. You can also combine built-in tools — Google Search and code execution — with custom function calling in a single Interaction. External APIs are exposed as custom functions or MCP servers and invoked by the agent inside its sandbox. One important caveat: the Interactions API does not natively manage embeddings or vector retrieval, so RAG pipelines still require a vector database like Pinecone or Weaviate, exposed to the agent as a tool. This keeps the API focused on orchestration while leaving retrieval to specialised infrastructure.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has shipped production multi-agent systems on Gemini, GPT, and Claude — including a 12-turn support-triage agent and a regulated-document review pipeline migrated to the Interactions API in 2026. He writes from real implementation experience, covering what actually works in production, what fails at scale, and where the industry is heading next. Explore his team's production-ready Twarx AI agents for build-ready patterns.

LinkedIn · Full Profile


This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.

Top comments (0)