aarhamforensics

Posted on Jun 27 • Originally published at twarx.com

Interactions API Gemini Models Agents: The 2026 GA Guide

#ai #machinelearning #automation #productivity

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 27, 2026

The Interactions API Gemini models agents launch just quietly made half your agent infrastructure stack redundant — and most developers have not realised it yet. The Interactions API reaching general availability is not a feature drop. It is a deliberate land-grab on the orchestration middleware market that LangGraph, AutoGen, and CrewAI spent two years building.

The Interactions API is now Google's single unified endpoint for Gemini models and agents — server-side state, background execution, native tool combination, and Managed Agents included. It replaces the stateless generateContent workflow that forced you to bolt external orchestration onto every multi-turn agent just to keep context alive between turns.

By the end of this piece, you'll know exactly what changed, how to migrate in five lines of code, what it costs, and whether to commit your stack to Google or hold your existing setup. If you're already building multi-agent systems, our AI agent library pairs directly with everything below.

Google's official Interactions API GA announcement — a single unified endpoint for Gemini models and agents with server-side state and background execution. Source

Coined Framework

The State Sovereignty Shift — the architectural moment when foundation model providers absorb stateful session management, tool routing, and background execution directly into the API layer, collapsing the need for external orchestration middleware and fundamentally redrawing the AI stack boundary

It names the structural pressure now bearing down on the independent orchestration market: when state lives on the model provider's servers, the middleware that existed solely to manage that state loses its reason to exist. The Interactions API is the clearest expression of this shift yet.

What Google Announced: Interactions API Now Generally Available

Official announcement date, source, and exact wording

On June 27, 2026, Google announced via the official blog.google post that the Interactions API had reached general availability and is now its primary API for interacting with Gemini models and agents. The post was co-authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind.

The exact wording matters: Google describes it as "A single unified endpoint for Gemini models and agents with server-side state, background execution, tool combination and multimodal generation." The public beta launched in December 2025, and per Google, "it has quickly become developers' favorite way to build applications with Gemini." That last line is doing a lot of work — it's the kind of framing that precedes a deprecation announcement.

Why June 2026 GA status matters for production teams

The single most consequential line for engineers: the GA release ships a stable schema. During the December 2025 preview, teams wouldn't commit production workloads to a schema that could break under them — reasonably so. GA removes that blocker. As Google states, "With this GA release, the API now has a stable schema and we also added major new capabilities that developers asked for, including Managed Agents, background execution, Gemini Omni (soon) and more." Schema stability is the difference between an interesting prototype and something you can actually put in front of customers.

The blog.google announcement vs the developer documentation release

Google confirmed that "All of our documentation now defaults to Interactions API and we are working with ecosystem partners to make it the default interface across 3P SDKs and Libraries." That coordinated push — docs, SDKs, and third-party libraries all re-defaulting to one surface simultaneously — is the tell. This isn't an isolated API update. It's a platform repositioning, and the docs move is the part most developers miss.

When a foundation model provider declares one endpoint its "primary interface" and re-defaults all documentation to it, that is not a feature announcement. That is a deprecation signal wearing a launch costume.

Dec 2025
Interactions API public beta launch
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1
Unified endpoint for both models and agents
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




3-5
Lines of code to migrate simple use cases
[Google AI Docs, 2026](https://ai.google.dev/gemini-api/docs)

What the Interactions API Is and How It Works

The core architectural difference: stateful sessions vs stateless generation

The legacy generateContent endpoint is stateless. Every request is independent. To run a multi-turn conversation, you re-sent the entire history on every call — inflating token costs and forcing client-side context management onto your team. I've seen this blow up billing estimates by 40% on turn 15 of a support conversation. The Interactions API flips it: it maintains server-side conversation state. The session remembers tool-call history, grounding context, and model memory between turns. If you're new to agent context management, our guide to how AI agents work covers the fundamentals.

Server-side state management explained for engineers

Think of it as the difference between HTTP and a persistent WebSocket session. With generateContent, you were the database — you stored the transcript, the tool results, the RAG chunks, and re-injected them every turn. With the Interactions API, a session object on Google's infrastructure holds that state. You pass a model ID for inference, an agent ID for autonomous tasks, and set background=True for anything long-running. Per Google: "Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code." That's not marketing copy — it's actually true for straightforward cases.

In stateless multi-turn implementations, re-injecting full conversation history caused measurable token cost inflation — often 30-60% of total spend on turn 10+ of a long conversation went to re-reading context the model had already seen. Server-side state eliminates that line item entirely.

The State Sovereignty Shift: why this collapses the middleware layer

Here's the part most developers miss. Three functions that previously required external infrastructure are now native:

Session memory — previously a vector database or external RAG layer for simple stateful tasks.
Background execution — previously developer-managed queuing like Celery or Cloud Tasks.
Tool routing — previously orchestration logic in LangGraph or AutoGen.

When all three move server-side, the question gets brutal: what exactly is your orchestration middleware still doing for a Gemini-only stack? That's the State Sovereignty Shift in one sentence. If you can't answer it clearly, you're probably paying for infrastructure you no longer need. For a deeper look at how this reshapes builder tooling, see our breakdown of AI orchestration frameworks.

Stateless generateContent vs Stateful Interactions API — the architectural collapse

  1


    **Client request (old: generateContent)**

You re-send full conversation history + tool results + RAG chunks every single turn. Your app is the state store.

↓


  2


    **External middleware (old: LangGraph / Celery / Pinecone)**

Orchestration graph manages context injection, tool routing, async queuing, and vector retrieval. High operational overhead.

↓


  3


    **Interactions API session (new)**

Single endpoint holds server-side state. Pass model ID or agent ID. Set background=True. Tool combination and grounding happen natively.

↓


  4


    **Response + persistent session handle**

Next turn references the session — no history re-injection. Background jobs return a job handle for polling or webhook completion.

The middleware layer that existed solely to manage state and async execution is absorbed into the API — the visual heart of the State Sovereignty Shift.

The shift from client-managed state to server-side sessions is what makes external orchestration middleware optional for Gemini-only stacks.

Full Capability Breakdown: What the Interactions API Can Do in 2026

Stateful multi-turn interactions and session persistence

The session object persists across turns, removing the need to manually inject conversation history into each request. This is the friction point that previously caused token cost inflation in stateless multi-turn builds. For a customer-support agent handling 20-turn conversations, this is the difference between linear and quadratic token growth. Not theoretical — I've watched the latter eat through a monthly budget in a week.

Background execution and async agent tasks

Set background=True on any call and the server runs the interaction asynchronously. Per Google: "The server runs the interaction asynchronously." The connection can close; the job keeps running. You poll a job handle or receive a webhook on completion. This replaces self-managed Celery or Cloud Tasks queues for long-horizon agent work — and if you've ever debugged a Celery worker silently dying at 2am, you understand why that matters.

Native tool combination and MCP integration

The API natively supports combining multiple tools — Google Search grounding, code execution, and MCP (Model Context Protocol)-connected external services — within a single session, without developer-side orchestration logic. MCP compatibility is the real differentiator here. It gives the Interactions API an expanding ecosystem of third-party tool integrations that stateless endpoints can't match without added middleware. Worth paying attention to as the MCP ecosystem matures — our MCP integration guide covers the wiring in detail.

Multimodal input support across the unified endpoint

The unified endpoint handles multimodal generation. Gemini 3 exposes explicit latency, cost, and multimodal fidelity trade-off controls — the "Level of Thinking" parameters documented in the Gemini 3 Developer Guide. You tune the compute-vs-cost dial per call, which is genuinely useful when you're running both quick lookup tasks and heavy reasoning in the same session.

Managed Agents: Antigravity and custom agent support

This is the headline new capability. Per Google: "A single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files. The Antigravity agent ships as the default, and you can define your own custom agents with instructions, skills and data sources." You can also connect to the Gemini Deep Research Agent via the A2A (Agent-to-Agent) protocol — meaning ADK-built agents can delegate to Gemini-hosted agents over a standardised protocol. That's infrastructure you used to build, deploy, and babysit yourself. Browse ready-made patterns in our agent template library.

Managed Agents are the first Google-hosted production agents accessible through a stable API — not demo sandboxes. A single call provisions a Linux sandbox that reasons, runs code, browses, and manages files. That is infrastructure you used to build and babysit yourself.

The Antigravity agent shipping as the default Managed Agent is strategically loaded: it means the path of least resistance for any new Gemini agent build now runs entirely inside Google's infrastructure, from sandbox to state.

How to Access and Use the Interactions API: Step-by-Step Guide

Prerequisites: API key, SDK versions, and model availability

You need a Gemini API key from Google AI Studio or Vertex AI. The Interactions API is available on both surfaces — but Vertex AI carries different enterprise SLA terms, which matters for production teams with real uptime commitments. If you're building multi-agent pipelines and want reusable patterns, you can explore our AI agent library.

Step-by-step: initialising a session and sending your first interaction

Python — first stateful interaction

Install the latest Google GenAI SDK

pip install -U google-genai

from google import genai

client = genai.Client(api_key='YOUR_API_KEY')

Create a stateful session against a Gemini model

session = client.interactions.create(
model='gemini-3', # pass a model ID for inference
# agent='antigravity', # OR pass an agent ID for autonomous tasks
)

Turn 1 — no history injection needed; the server holds state

reply = session.send('Summarise our refund policy for a new agent.')
print(reply.text)

Turn 2 — references the same session; history persists server-side

followup = session.send('Now rewrite it in Spanish.')
print(followup.text)

Enabling background execution mode

Python — background (async) long-running task

Long-horizon research task — connection can close, job keeps running

job = client.interactions.create(
agent='deep-research',
execution_mode='background', # server runs it asynchronously
)

handle = job.send('Compile a competitive teardown of the top 5 vector DBs.')

Poll the job handle, or register a webhook for completion notification

result = client.interactions.poll(handle.id)
print(result.status) # 'running' -> 'completed'

Per Google, background execution is enabled by setting the mode on the session config; the API returns a job handle for polling or webhook-based completion. Confirm exact parameter names against the live Gemini API documentation — the schema is stable at GA, but implementation details shift faster than blog posts do.

Connecting MCP tools and external APIs

MCP-connected tools attach to the session natively. You register the MCP server endpoint and the model routes tool calls through it — no custom orchestration glue. This is where the Interactions API meaningfully overlaps with what teams previously built in workflow automation layers, and where the middleware compression effect starts showing up in your infrastructure bill.

Pricing structure and what changes from the generateContent endpoint

The pricing model adds a new billable dimension: session state storage, billed beyond standard per-token input/output costs. Per the GA release notes, session storage is priced per GB-hour — check the live Google AI pricing page for current figures before budgeting. The trade-off is usually favourable: a small storage fee against the token cost of re-injecting history every turn. But long-lived sessions with heavy multimodal context can surprise you. Set TTLs.

Apple developer access via Foundation Models framework

In a concurrent June 2026 announcement, Apple developers can now call cloud-hosted Gemini models — including via the Interactions API — through the Foundation Models framework in Xcode. That puts Gemini directly inside the on-device AI developer workflow on Apple platforms. It's a quiet but significant distribution move.

The migration from generateContent to a stateful Interactions API session is a three-to-five line change for simple use cases — the lowest-friction lock-in Google has ever shipped.

[
▶

Watch on YouTube
Gemini Interactions API and Managed Agents — architecture walkthrough
Google DeepMind • Gemini agent infrastructure

](https://www.youtube.com/results?search_query=Google+Gemini+Interactions+API+agents+DeepMind)

When to Use the Interactions API vs Alternatives

Use cases where Interactions API wins clearly

Customer-facing multi-turn chat agents where session memory is the core requirement.
Long-horizon research tasks via the Gemini Deep Research Agent.
Any build where cutting infrastructure operational overhead is the priority — not just a nice-to-have.
Teams already standardised on Google Cloud / Vertex AI.

Use cases where LangGraph, AutoGen, or CrewAI still make sense

Cross-model workflows mixing Gemini with non-Google LLMs like Claude or GPT — the Interactions API won't help you here.
Complex conditional branching logic that exceeds the API's native tool routing.
Compliance requirements that prohibit server-side state on third-party infrastructure.
Highly customised RAG retrieval where you need more control than native grounding gives you.

The hybrid architecture: Interactions API plus external orchestration

A credible pattern is emerging: use the Interactions API as the Gemini execution layer — a stateful node — inside an ADK or LangGraph orchestration graph, rather than ripping out the whole graph. You get native state for Gemini steps while keeping cross-model flexibility where you actually need it. This is the migration I'd recommend to most teams right now.

The smartest migration in June 2026 is not all-in or all-out. It is treating the Interactions API as a stateful Gemini node inside your existing graph — you capture the token savings and operational simplicity without surrendering multi-model freedom.

Interactions API vs Closest Competitors: Direct Comparison

Interactions API vs OpenAI Assistants API

Both offer server-side thread state and tool calling. The Interactions API's native A2A protocol and Managed Agents integration represent a capability gap the OpenAI Assistants API hasn't closed as of June 2026. Neither solves cross-model orchestration — both are single-provider surfaces.

Interactions API vs Anthropic's API and tool use layer

Anthropic's API remains stateless at the model-call level — developers manage conversation history client-side or via middleware. That gives Google a structural advantage for multi-turn agent use cases without added infrastructure. Whether Claude's model quality offsets that architectural gap depends on your workload.

Interactions API vs self-hosted orchestration: LangGraph, AutoGen, CrewAI, n8n

LangGraph (v0.2+), AutoGen (v0.4), and CrewAI offer more flexible cross-model and conditional branching orchestration — but require self-managed infrastructure. n8n is an integration surface, not a direct competitor; its Google Gemini node will need an Interactions API connector to unlock stateful agent workflows in visual pipelines. That update hasn't shipped yet as of this writing. We compare these surfaces in depth in our best AI agent frameworks guide.

CapabilityInteractions APIOpenAI Assistants APIAnthropic APILangGraph / AutoGen

Server-side stateYes (native sessions)Yes (threads)No (client-managed)Self-managed

Background executionYes (background=True)PartialNoSelf-managed (Celery)

Managed Agents (cloud sandbox)Yes (Antigravity default)NoNoNo

A2A protocolYesNoNoVia ADK

MCP tool supportNativePartialVia middlewareYes

Cross-model (non-Gemini)NoNo (OpenAI only)No (Claude only)Yes

Operational overheadLowLowMediumHigh

Industry Impact: What the Interactions API Changes for the AI Stack

The middleware compression effect on the orchestration layer

The State Sovereignty Shift is accelerating. With Google — and to a lesser extent OpenAI via Assistants — absorbing state management into the API layer, the independent orchestration middleware market faces structural pressure. Companies that built orchestration-only products without a model or deployment layer are the most exposed. This isn't speculation; it's just where the money flows when the problem they solved stops being a problem.

Coined Framework

The State Sovereignty Shift — applied to vendor economics

When state, tool routing, and async execution become native API primitives, the value of standalone middleware compresses toward zero for single-provider stacks. Survival depends on owning either the model, the data, or genuine cross-provider neutrality.

What this means for enterprise AI platform decisions in 2026

Platform decisions are now a genuine three-way split: Google Interactions API + Vertex AI, OpenAI Assistants + Azure OpenAI, or a self-hosted open-source stack. The middle ground — mixing frontier model APIs with independent orchestration middleware — is becoming harder to justify operationally. The teams I see struggling most are the ones who chose middleware first and model second. See our deeper take on enterprise AI platform strategy.

Impact on vector database and RAG vendors

Pinecone, Weaviate, and Chroma face partial demand erosion for simple retrieval now handled by built-in grounding. But they remain essential for proprietary data indexing at scale and hybrid search beyond Gemini's native grounding window. For teams running RAG systems, the vector DB isn't dead — it's repositioned.

Vector databases are not being killed by native grounding. They are being demoted — from default infrastructure for every chatbot to specialised infrastructure for proprietary, large-scale, hybrid-search retrieval. That is still a real business. It is just a smaller TAM.

3-way
Enterprise platform split (Google / OpenAI / self-hosted)
[Twarx analysis, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




12-18mo
Predicted window to generateContent maintenance-only status
[Twarx prediction, 2026](https://ai.google.dev/gemini-api/docs)




End-to-end
ADK-to-production agent workflow captured by Google
[Google ADK Docs, 2026](https://google.github.io/adk-docs/)

Common Mistakes Teams Make Migrating to the Interactions API

  ❌
  Mistake: Ripping out LangGraph entirely on day one

Teams over-rotate and delete their whole orchestration graph, then discover they need a non-Gemini model for one branch and have no escape hatch. I've watched this exact scenario cost a team two weeks of re-architecture work.

✅

Fix: Adopt the hybrid pattern — use the Interactions API as a stateful Gemini node inside LangGraph or ADK first, then collapse the graph only if you confirm a Gemini-only future.

  ❌
  Mistake: Ignoring session storage costs

The new per-GB-hour session storage dimension is easy to miss in the GA notes. Long-lived sessions with heavy multimodal context can quietly accumulate storage charges that don't show up until the end-of-month bill.

✅

Fix: Set session TTLs, archive completed sessions, and benchmark storage spend against the token savings on the live Google AI pricing page.

  ❌
  Mistake: Underestimating vendor lock-in

Server-side state lives on Google's infrastructure. A team that builds everything around native sessions has no portable transcript if it needs to switch providers. That's a real risk, not a theoretical one.

✅

Fix: Mirror critical conversation state to your own store, and keep a thin abstraction layer so the session backend can be swapped if strategy changes.

  ❌
  Mistake: Treating background execution as fire-and-forget

Setting background=True without wiring up job handle polling or a webhook means long-running agent results silently vanish from the user flow. This will happen to you in production if you don't wire it up before launch.

✅

Fix: Always register a completion webhook or implement deterministic polling with retry/backoff on the returned job handle before shipping background tasks.

Expert and Community Reactions to the Interactions API Launch

Developer community response on X, Reddit, and HackerNews

Response has been broadly positive on schema stability after the December 2025 preview — that was the main blocker for production adoption and it's now resolved. The most cited concern is vendor lock-in risk, since server-side state now lives inside Google's infrastructure. Migration hesitation concentrates among teams running non-Gemini models — the Interactions API's Gemini-only scope means any multi-model workflow still needs a hybrid or external approach. That's the single most-cited limitation in developer forums as of June 2026.

AI engineer perspectives: migration enthusiasm and concern

The Managed Agents announcement drew significant attention from enterprise teams exploring the Antigravity agent and the Gemini Deep Research Agent — the first Google-hosted production agents accessible via a stable API rather than demo environments. Per blog co-author Ali Çevik, Group Product Manager at Google DeepMind, the API has "quickly become developers' favorite way to build applications with Gemini." The qualifier "with Gemini" is doing a lot of work in that sentence.

The #TheGenAIGirl analysis: what it got right and what it missed

The widely shared Medium analysis "Interactions API + ADK: A Closer Look" correctly identified the stateful session model as the key architectural differentiator — but underestimated the competitive implications for the orchestration middleware market. That gap is precisely the State Sovereignty Shift this article names directly. Getting the technical architecture right while missing the market consequence is a common pattern in AI coverage right now.

What Comes Next: Roadmap Signals and Predictions

Likely deprecation timeline for legacy generateContent endpoints

Google hasn't announced a deprecation date for generateContent. But the explicit "primary interface" language plus re-defaulting all documentation is a strong signal the older endpoint moves to maintenance-only status within 12-18 months. Production teams should begin migration planning now. Waiting until a deprecation date is announced is how you end up doing an emergency migration under pressure.

Predicted expansion of Managed Agents catalogue

The catalogue is expected to expand beyond Antigravity and Deep Research. Google's pattern of releasing Workspace, Search, and Maps capabilities as developer surfaces suggests coding, calendar, and document agents are likely near-term additions. Gemini Omni is explicitly flagged as "soon" in the GA post — that's unusually direct language for a roadmap signal.

The Interactions API as Google's platform lock-in play through 2027

By 2027, the Interactions API is likely to become the de facto standard for any Google Cloud AI workload — making early adoption a strategic advantage and delayed migration a risk that compounds into a larger forced migration later. The Apple Foundation Models integration signals Google's intent to be the cloud AI backend of choice for on-device apps. That's a direct challenge to OpenAI's Apple positioning and Anthropic's enterprise API business, and it's a smarter distribution play than most people are giving it credit for. For builders, our production agent templates already account for this shift.

2026 H2


  **Gemini Omni ships; Managed Agents catalogue expands**

Google flags Gemini Omni as "soon" in the GA post; its Workspace/Search/Maps surface pattern points to coding and document agents next.

2027 H1


  **generateContent moves to maintenance-only**

The "primary interface" language and doc re-defaulting historically precede a 12-18 month deprecation glide path.

2027 H2


  **Orchestration-only middleware consolidates**

With state native across Google and OpenAI, standalone orchestration vendors pivot to cross-model neutrality or get acquired — the State Sovereignty Shift's terminal phase.

The predicted glide path: as the Interactions API becomes Google's de facto standard, the window for low-risk migration narrows through 2027.

Frequently Asked Questions

What is the Interactions API and how is it different from the Gemini generateContent endpoint?

The Interactions API is Google's unified, stateful endpoint for Gemini models and agents, generally available since June 27, 2026. Unlike the stateless generateContent endpoint — where you re-send the full conversation history on every call — the Interactions API maintains server-side session state, persisting tool-call history, grounding context, and model memory. It adds background execution (set background=True), native tool combination including MCP, multimodal generation, and Managed Agents. The practical impact: multi-turn conversations no longer suffer token cost inflation from history re-injection, and long-running tasks no longer need self-managed queues. Google now defaults all documentation to it and calls it the "primary interface," signalling generateContent will eventually move to maintenance-only.

Is the Interactions API generally available and production-ready in 2026?

Yes. Google announced general availability on June 27, 2026 via the official blog.google post, co-authored by Ali Çevik (Group Product Manager, Google DeepMind) and Philipp Schmid (Developer Relations Engineer, Google DeepMind). The public beta ran from December 2025. The critical production signal is the stable schema shipped at GA — the schema instability that made teams hesitant during preview is resolved. It is available on both Google AI Studio and Vertex AI, with Vertex AI offering enterprise SLA terms suited to production uptime commitments. New capabilities at GA include Managed Agents, background execution, and Gemini Omni (coming soon). For Gemini-only stacks, it is production-ready today; for multi-model workflows, a hybrid approach remains necessary.

How do I migrate from the Gemini generateContent API to the Interactions API?

Per Google's documentation, simple use cases require only a three-to-five line change. Update your client initialisation and switch from generateContent() to the interactions.create() session method. Pass a model ID for inference or an agent ID for autonomous tasks. Once you have a session, subsequent send() calls reference it automatically — you stop manually injecting conversation history. To enable async work, set execution_mode='background' and handle the returned job handle via polling or webhooks. Migration best practice: start with the hybrid pattern, treating the Interactions API as a stateful Gemini node inside your existing LangGraph or ADK graph rather than a full rip-and-replace. Always confirm exact parameter names against the live Gemini API documentation, since the GA schema is now stable.

What are Managed Agents in the Gemini API and how do they connect to the Interactions API?

Managed Agents are Google-hosted production agents accessible through the Interactions API. Per Google, a single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web, and manage files. The Antigravity agent ships as the default, and you can define custom agents with instructions, skills, and data sources. You can also connect to the Gemini Deep Research Agent via the A2A (Agent-to-Agent) protocol — meaning ADK-built agents can delegate to Gemini-hosted agents over a standardised interface. This is significant because these are the first Google-hosted agents available through a stable API rather than demo-only environments, removing the need to build and operate your own agent sandbox, code-execution environment, and browsing infrastructure.

How does the Interactions API compare to the OpenAI Assistants API?

Both offer server-side conversation state and tool calling — the Assistants API via threads, the Interactions API via sessions. The key differentiators as of June 2026: the Interactions API ships native A2A protocol support and Managed Agents (cloud Linux sandboxes with the Antigravity agent), capabilities the OpenAI Assistants API does not currently match. Both are single-provider — the Interactions API is Gemini-only, Assistants is OpenAI-only — so neither solves cross-model orchestration without a hybrid approach. The Interactions API also supports MCP-connected tools natively and exposes Gemini 3's "Level of Thinking" latency/cost/fidelity controls. For teams choosing a platform, the decision usually tracks which model family and cloud (Vertex AI vs Azure OpenAI) they have standardised on, rather than raw feature parity.

Does the Interactions API support MCP tools and external API integrations?

Yes. The Interactions API natively supports combining multiple tools within a single session — including Google Search grounding, code execution, and MCP (Model Context Protocol)-connected external services — without developer-side orchestration logic. MCP compatibility is a genuine differentiator: it gives the API an expanding ecosystem of third-party tool integrations that stateless endpoints cannot match without additional middleware. You register the MCP server endpoint and the model routes tool calls through it automatically. The API also supports A2A connections, so external orchestration frameworks like ADK-built agents can delegate to Gemini-hosted agents over a standardised protocol. This native tool combination is precisely what removes the need for an external orchestration layer in Gemini-only stacks — the practical core of the State Sovereignty Shift.

What is the pricing model for the Interactions API and how does server-side state affect costs?

The Interactions API keeps standard per-token input/output pricing but adds a new billable dimension: session state storage, priced per GB-hour per the GA release notes. The trade-off is favourable for most multi-turn workloads — you pay a small storage fee but eliminate the token cost of re-injecting full conversation history on every turn, which in stateless implementations could consume 30-60% of spend on long conversations. To control costs: set session TTLs, archive or delete completed sessions, and benchmark storage spend against token savings. Always verify current figures on the live Google AI pricing page before budgeting, since exact per-GB-hour rates change. On Vertex AI, enterprise pricing and SLA terms differ from the AI Studio surface, which matters for production cost modelling at scale.

The bottom line: The Interactions API Gemini models agents release is the clearest evidence yet that the AI stack boundary is being redrawn. For Gemini-only teams, half your orchestration middleware just became optional. For everyone else, the hybrid node pattern is the move. The State Sovereignty Shift is no longer a prediction — it is shipping in production.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He writes from real implementation experience — covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile

This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.