DEV Community

aarhamforensics
aarhamforensics

Posted on • Originally published at twarx.com

Interactions API Gemini Models Agents: The 2026 GA Guide

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 26, 2026

Google just made your custom state management code obsolete — and most developers building on Gemini today don't realise it yet. The Interactions API for Gemini models and agents reaching General Availability in June 2026 is not a feature drop; it's a platform-level declaration that the era of developer-owned agent orchestration is ending, and the cloud provider is taking the wheel.

The Interactions API is now Google's primary interface for calling Gemini models and running agents — a single unified endpoint with server-side state, background execution, tool combination, and multimodal generation. It replaces the manual orchestration most teams built around GenerateContent.

By the end of this article you'll know exactly what changed, what it costs, how it compares to OpenAI's Responses API, and whether migrating your production agent stack is worth the engineering bill. If you're new to autonomous systems, start with our primer on what AI agents actually are.

Google AI Studio Interactions API general availability announcement banner for Gemini models and agents

Google's official announcement of the Interactions API reaching general availability — the new primary interface for Gemini models and agents. Source

Coined Framework

The State Sovereignty Shift — the emerging architectural divide between AI systems where developers own and manage conversational and agent state locally versus platforms that absorb state server-side, locking in infrastructure dependency in exchange for reduced orchestration complexity

This is the single most important lens for evaluating the Interactions API. When you let Google hold your conversation history, tool results, and agent memory, you trade orchestration code for a new and harder-to-reverse class of vendor dependency.

What Was Announced: Official Facts, Dates, and Sources

Here's the most consequential fact: as of the official blog.google announcement, the Interactions API has reached general availability and is now Google's primary API for interacting with Gemini models and agents. Not experimental. Not preview. GA.

General Availability confirmation and official announcement date

The announcement was authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind. The post confirms the API launched its public beta in December 2025 and has since become, in Google's words, 'developers' favorite way to build applications with Gemini.' Make of that marketing framing what you will — the GA designation is the part that actually matters.

What changed from preview to GA: stable schema and new features

The GA release ships with a stable schema plus a set of developer-requested capabilities: Managed Agents, background execution, Gemini Omni ('soon'), and tool improvements. Per the announcement, all of Google's documentation now defaults to the Interactions API, and Google is working with ecosystem partners to make it the default interface across third-party SDKs and libraries.

A 'stable schema' is the unglamorous detail that matters most for enterprise teams. It means breaking changes now follow a versioned deprecation cycle — the exact guarantee that was missing during the December 2025 beta and that blocked many regulated industries from adopting it. This is the line item your solutions architect actually cares about.

Official sources: blog.google posts and Google AI for Developers documentation

The canonical sources are the blog.google GA post and the Google AI for Developers documentation, which now defaults all examples to the Interactions API. The Gemini 3 model family — including Gemini 3 Pro — is the first generation built around the Interactions API as its primary interface. Managed Agents run inside a secure cloud sandbox, removing the need for developers to host agent runtime infrastructure themselves.

Dec 2025
Public beta launch of the Interactions API
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1 endpoint
Unified interface replacing separate model, chat and tool calls
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




Gemini 3
First model family built around Interactions API primitives
[Google DeepMind, 2026](https://deepmind.google/models/gemini/)
Enter fullscreen mode Exit fullscreen mode

What Is the Interactions API and How Does It Work

In plain language: one door instead of three. Before this, building a Gemini-powered agent meant manually stitching together GenerateContent for inference, a chat pattern for conversation history, and your own logic for tool calls. I've done this. It's exactly as tedious as it sounds. The Interactions API collapses all of that into a single stateful endpoint.

The single unified endpoint architecture explained

According to the announcement: 'Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code. Pass a model ID for inference, an agent ID for autonomous tasks, set background=True for anything long-running.'

That single sentence is the whole design philosophy. The endpoint doesn't care whether you want a one-shot completion or a multi-hour autonomous agent run — the same call shape covers both. This is fundamentally a stateful RPC pattern, not the stateless REST model most developers learned from OpenAI's Chat Completions API. That distinction isn't academic. It changes how you think about cost, failure modes, and what happens when Google has an outage.

Server-side state: what it stores and who owns it

This is the crux of the State Sovereignty Shift. Server-side state means conversation history, tool call results, and agent memory live on Google's infrastructure — not in your Postgres or Redis. You stop passing a giant message-history array on every turn. You reference a session, and Google reconstructs context for you. We unpack this trade in our guide to agent memory architectures.

The moment your agent's memory lives on someone else's servers, you haven't just simplified your code — you've changed who controls your most valuable production asset.

How background execution differs from standard request-response

Setting background=True tells the server to run the interaction asynchronously. The agent keeps working — reasoning, calling tools, writing files — even when no client holds an open connection. This is the feature that lets a multi-step research or coding agent run for minutes or hours without a fragile long-lived HTTP socket. Multimodal inputs (text, image, audio, video, documents) are handled natively inside a single interaction session.

Diagram comparing stateless GenerateContent message passing versus stateful Interactions API server-side session

The architectural shift from passing full message history on every call (stateless) to referencing a server-held session (stateful) — the foundation of the State Sovereignty Shift.

How a Single Interactions API Call Resolves an Agentic Task

  1


    **Client sends one request**
Enter fullscreen mode Exit fullscreen mode

You pass either a model ID (for inference) or an agent ID (for autonomous tasks), plus your input and an optional background=True flag. No message-history array required.

↓


  2


    **Server resolves session state**
Enter fullscreen mode Exit fullscreen mode

Google reconstructs prior turns, tool results, and agent memory from its own infrastructure. Latency note: first turn cold-starts the session; subsequent turns reuse stored context.

↓


  3


    **Managed Agent sandbox executes**
Enter fullscreen mode Exit fullscreen mode

For agent runs, a remote Linux sandbox reasons, executes code, browses the web, and manages files. The default is the Antigravity agent; custom agents carry their own instructions, skills, and data sources.

↓


  4


    **Tools chain natively**
Enter fullscreen mode Exit fullscreen mode

Web search, code execution, function calling, and RAG retrieval combine within the interaction — no developer-side orchestration loop required.

↓


  5


    **Result returns (or polls)**
Enter fullscreen mode Exit fullscreen mode

Synchronous calls return immediately. Background calls return a handle you poll, letting long-running workflows survive client disconnects.

The sequence matters because state and orchestration that used to live in your code now live inside Google's infrastructure — that is the entire trade.

Full Capability Breakdown: Every Feature in the GA Release

Managed Agents: cloud-sandboxed agent execution

Per the announcement, a single API call 'provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files.' The Antigravity agent ships as the default, and you can define custom agents with instructions, skills, and data sources. The practical implication: you no longer need an external orchestration framework like LangGraph or CrewAI for the most common agentic patterns. Whether that's a relief or a red flag depends entirely on how much you trust Google's infrastructure uptime and your own data-residency requirements. Browse ready-made designs in our AI agent library.

Tool combination: how native tool chaining works

The GA release lets you mix built-in tools — web search, code execution, function calling, RAG retrieval — chained inside a single interaction without writing the orchestration loop yourself. In the legacy pattern, you'd intercept each tool call, execute it, feed the result back manually, and pray you didn't drop a result mid-chain. The Interactions API absorbs that loop server-side. It's genuinely less code. The tradeoff is you've also lost visibility into exactly what's happening inside that loop.

Multimodal input support and session continuity

Session continuity persists context across turns server-side, replacing the pattern of passing full message-history arrays on every call. Multimodal inputs are first-class inside the same session, and Gemini Omni is flagged as 'soon' in the announcement for expanded generation capabilities.

Background execution and async task handling

Background execution is the feature most relevant to replacing n8n or AutoGen-style workflow automation for AI-native tasks. Long-running interactions execute asynchronously on Google's servers. No persistent client connection. No babysitting a socket.

Developers migrating report eliminating 200–400 lines of state-management and message-history handling code per agent. That's not a vanity metric — it's the exact surface area where the most subtle production bugs historically lived. Truncated history, lost tool results, race conditions on concurrent turns. All of that moves to Google's problem. Which is great, until it isn't.

Stable schema: what developers can now rely on in production

The stable schema commitment means breaking changes follow a versioned deprecation cycle — the enterprise requirement that was conspicuously absent during preview. For solutions architects, this is the green light to build SLAs on top of the API rather than treating it as a moving target. Read the broader context in our coverage of enterprise AI adoption.

Python — Interactions API (stateful, conceptual)

Legacy pattern: you owned the whole history array

messages = [...] # you stored, truncated, re-sent this every turn
response = client.models.generate_content(model='gemini-pro', contents=messages)

Interactions API: the server owns the session

interaction = client.interactions.create(
agent='antigravity', # or model='gemini-3-pro' for inference
input='Research Q2 competitor pricing and draft a summary',
background=True # long-running task survives disconnect
)

Poll for completion instead of holding a socket open

result = client.interactions.get(interaction.id)

How to Access and Use the Interactions API: Step-by-Step Guide

Prerequisites: API key, SDK version, and model access requirements

Access requires the latest Google AI Python or JavaScript SDK. Legacy SDK versions targeting GenerateContent will not automatically migrate — you upgrade the SDK and adjust your call shape. Don't assume otherwise; I've seen teams waste a week discovering this. Grab an API key from Google AI Studio.

Quickstart: creating your first stateful interaction session

  • Install the latest SDK and authenticate with your API key.

  • Choose your mode: pass a model ID for inference or an agent ID for autonomous tasks.

  • Create an interaction — the server opens a session and holds state.

  • Continue the conversation by referencing the session, not resending history.

  • Set background=True for anything long-running, then poll for the result.

Setting up Managed Agents with custom tools

The default Antigravity agent works out of the box. For custom agents, you define instructions, skills, and data sources, then reference the agent ID in your interaction call. The remote Linux sandbox handles code execution and web browsing. If you're prototyping agent designs, you can explore our AI agent library for reusable patterns before committing to a managed implementation.

Step-by-step Managed Agents setup flow showing custom agent definition with instructions skills and data sources

Setting up a custom Managed Agent: the cloud sandbox replaces the agent-runtime infrastructure teams previously self-hosted.

Pricing structure and what state storage costs in production

Pricing follows the established Gemini per-token model for generation, plus a separate charge for managed state storage duration. Exact per-session storage figures live on the official Gemini API pricing page. Model your total cost of ownership as tokens generated + (session storage × session lifetime). Here's the part teams miss: long-lived agent sessions that sit idle still accrue storage cost. You'll find out at month-end if you don't account for it upfront.

Availability by region and platform including Apple Foundation Models integration

The Interactions API is available via Google AI for Developers (consumer tier) and Vertex AI (enterprise tier) with different SLA and data-residency guarantees. Notably, Apple developers can now call cloud-hosted Gemini models via the Foundation Models framework and access Gemini in Xcode — a clean native path that previously didn't exist for iOS and macOS apps.

[

Watch on YouTube
Google DeepMind: Building agents with the Gemini Interactions API
Google DeepMind • Gemini agent architecture
Enter fullscreen mode Exit fullscreen mode

](https://www.youtube.com/results?search_query=Google+Gemini+Interactions+API+agents)

When to Use the Interactions API vs Alternatives

Use cases where Interactions API wins

The Interactions API is the clear winner for teams that want to cut orchestration code and don't have strict data-sovereignty constraints. Agentic workflows, long multi-turn sessions, multimodal pipelines where managing context manually is the dominant engineering cost — these are the scenarios where it earns its place. If your agents are eating most of your engineering time just maintaining state, this is a real improvement.

When to stay on raw GenerateContent or OpenAI-compatible endpoints

Simple single-turn or low-context applications don't need server-side sessions. Full stop. A classification call or a one-shot summarisation doesn't need server-side memory — stateful sessions just add latency and storage cost for zero benefit. Raw GenerateContent or an OpenAI-compatible endpoint stays leaner for that work.

When LangGraph, CrewAI, or AutoGen still make more sense

LangGraph stays superior for complex, graph-based multi-agent workflows where you need fine-grained control over state transitions and human-in-the-loop approval steps. CrewAI and AutoGen offer portable, framework-agnostic agent definitions not locked to one provider's infrastructure. See our deeper comparison of multi-agent systems.

The State Sovereignty Shift: evaluating the trade-off

If your compliance policy says agent memory must never leave your own infrastructure, Managed Agents in their current form are a non-starter — no amount of saved boilerplate changes that calculus.

Interactions API vs Closest Competitors: A Direct Comparison

    Capability
    Google Interactions API
    OpenAI Responses API
    Anthropic Tool Use
    LangGraph + Any Model






    Server-side state
    Yes (GA, June 2026)
    Yes (introduced early 2025)
    No (stateless by design)
    Developer-managed




    Managed agent sandbox
    Yes (Antigravity default)
    Partial / evolving
    No
    Self-hosted




    Background execution
    Yes (background=True)
    Limited
    Developer-built
    Developer-built




    Model portability
    Gemini only
    OpenAI only
    Anthropic only
    Fully portable




    Orchestration code burden
    Lowest
    Low
    High
    Highest




    Lock-in risk
    High (state + model)
    High (state + model)
    Medium (model only)
    Lowest
Enter fullscreen mode Exit fullscreen mode

Interactions API vs OpenAI Responses API

OpenAI's Responses API introduced server-side conversation state in early 2025 — the Interactions API is Google's structural response. The differentiator right now is GA status combined with Managed Agents shipping a default sandbox, which gives Google a current production advantage over some OpenAI agentic features still in beta. That gap will close. But 'still in beta' is a real constraint if you're shipping to customers today.

Interactions API vs Anthropic's architecture

Anthropic's tool-use architecture remains stateless at the API level — a deliberate philosophical choice. You own your state, which means you manage your state. If that sounds like a burden, the Interactions API is more appealing. If it sounds like a guarantee, Anthropic's approach is a feature, not a gap.

Interactions API vs LangGraph + any model

LangGraph plus any model — including Gemini via the OpenAI-compatible endpoint — is the most portable solution. It's also the most engineering-intensive path to production parity with Managed Agents. You're trading setup time for freedom, and that's a legitimate trade depending on your team's constraints.

Interactions API vs MCP server pattern

MCP (Model Context Protocol) is complementary, not a competitor. The Interactions API can consume MCP-compatible tool servers, and that integration path is likely to expand. RAG pipelines on Pinecone, Weaviate, or pgvector integrate as tools within sessions, though retrieval logic lives outside Google's managed infrastructure. See our primer on RAG architectures.

Industry Impact: What the Interactions API GA Means for AI Development in 2026

The consolidation of orchestration layers

Orchestration middleware and open-source frameworks face real pressure as Managed Agents absorb the most common agentic workflow patterns at the infrastructure level. The reusable building blocks that frameworks monetised are increasingly free primitives inside the API. That's not a subtle shift — it's the kind of thing that changes which companies raise their Series B and which ones quietly wind down.

When the cloud provider gives away orchestration for free, the question stops being 'which framework?' and becomes 'how much of my architecture am I willing to rent?'

Impact on the ADK ecosystem and third-party agent frameworks

Google's Agent Development Kit (ADK) is now positioned as the developer-facing layer above the Interactions API — a two-tier abstraction: raw API for power users, ADK for rapid agent development. Explore how this fits broader orchestration trends.

What this means for enterprise AI procurement

Procurement teams must now model total cost of ownership including state storage, session-duration fees, and the switching cost of cloud-side state lock-in. A migration that saves engineer-hours but adds recurring storage cost and makes future provider switches expensive is a different ROI conversation than a simple per-token comparison. I'd want that analysis in writing before signing a multi-year commitment.

200–400
Lines of state-handling code eliminated per agent (early adopters)
[Developer reports, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




2-tier
ADK over Interactions API abstraction model
[Google ADK, 2026](https://google.github.io/adk-docs/)




2
Access tiers — Google AI for Developers + Vertex AI
[Google Cloud, 2026](https://cloud.google.com/vertex-ai)
Enter fullscreen mode Exit fullscreen mode

The broader platform lock-in debate

The GA signals that at least one major lab now treats stateful agentic execution as a solved infrastructure problem — not a research problem — which will accelerate competitive responses. Meanwhile n8n, Zapier AI, and similar workflow automation platforms face medium-term disruption as background execution can replace many AI-native automation patterns. Not immediately. But the pressure is real.

What It Means for Small Businesses

If you run a 10-person company, here's the plain-English version: the Interactions API lets you build an AI assistant or automation that remembers previous conversations and runs long tasks on its own — without hiring an engineer to build memory plumbing. A small e-commerce shop could deploy a support agent that recalls a customer's prior orders across sessions, or a marketing consultancy could run an overnight competitor-research agent with background=True and review results in the morning.

The opportunity: dramatically less engineering means a solo founder can ship an agent that previously needed a team. The risk: your business data — customer chats, agent memory — now lives on Google's servers, and migrating away later is harder than swapping a model. For a bakery, that's fine. For a law firm or clinic, it's a compliance question to answer before you build, not after. Our AI for small business guide goes deeper on this.

Who Are Its Prime Users

  • AI developers and solutions architects building production agentic systems who want to cut orchestration code.

  • Startups (1–50 people) that need to ship fast and lack a dedicated platform team.

  • iOS/macOS developers newly able to integrate Gemini via Apple's Foundation Models framework.

  • Enterprises on Vertex AI that need SLA and data-residency guarantees for regulated workloads.

  • Automation builders currently stitching n8n or AutoGen flows for AI-native tasks — though they should think hard about what they're giving up.

How to Use It: A Worked Demonstration

Let's run a concrete example: an overnight competitor-pricing research agent.

Python — Worked example (conceptual)

INPUT: ask a Managed Agent to research and summarise — in the background

interaction = client.interactions.create(
agent='antigravity', # default cloud-sandboxed agent
input='Find current pricing for the top 3 competitors '
'in project-management SaaS and draft a 5-bullet summary.',
background=True # runs async on Google's servers
)
print(interaction.id) # -> 'intx_8f2a...'

STEP: agent browses the web + executes code in the sandbox (no client needed)

STEP: you poll later — even after closing your laptop

result = client.interactions.get('intx_8f2a...')
print(result.output)

Actual output (illustrative):

Agent output

  • Competitor A: $12/user/mo (annual), free tier capped at 5 users
  • Competitor B: $9.80/user/mo, no free tier, 14-day trial
  • Competitor C: $15/user/mo, includes time-tracking add-on
  • Pricing gap: your $10 tier undercuts A and C, sits above B
  • Recommendation: emphasise free-tier generosity vs B in positioning

What used to require a scraping script, a vector store, a scheduler, and roughly 300 lines of glue code now fits in one stateful call. That's the State Sovereignty Shift in action — convenience for dependency. Whether that swap is worth it is a decision only you can make for your specific stack.

Good Practices and Common Pitfalls

  ❌
  Mistake: Leaving sessions open indefinitely
Enter fullscreen mode Exit fullscreen mode

Server-side state has a storage-duration cost. Idle long-lived sessions silently accrue charges, surprising teams at month-end on the Gemini pricing bill.

Enter fullscreen mode Exit fullscreen mode

Fix: Set explicit session lifetimes and close interactions when a task completes; monitor storage cost as a first-class metric.

  ❌
  Mistake: Migrating compliance-sensitive workloads blindly
Enter fullscreen mode Exit fullscreen mode

Moving conversation and agent memory to Managed Agents may violate data-residency policies if that data must stay on your infrastructure. I would not ship regulated data to Managed Agents without a signed data processing agreement and explicit residency confirmation from your Vertex AI rep.

Enter fullscreen mode Exit fullscreen mode

Fix: Use the Vertex AI enterprise tier for residency guarantees, or keep regulated workloads on developer-owned state via LangGraph.

  ❌
  Mistake: Using stateful sessions for one-shot calls
Enter fullscreen mode Exit fullscreen mode

Wrapping a simple classification or summarisation in a session adds latency and storage overhead for zero benefit.

Enter fullscreen mode Exit fullscreen mode

Fix: Reserve stateful interactions for multi-turn or agentic tasks; use plain inference calls for single-turn work.

  ❌
  Mistake: Assuming legacy SDKs auto-upgrade
Enter fullscreen mode Exit fullscreen mode

Old SDK versions targeting GenerateContent do not automatically migrate to the Interactions API call shape. This failure is silent — your code won't throw an obvious error, it'll just keep using the old pattern.

Enter fullscreen mode Exit fullscreen mode

Fix: Upgrade to the latest Google AI SDK and update your three core call parameters before testing in production.

Average Expense to Use It

Realistic cost model: generation tokens follow standard Gemini per-token rates on the official pricing page, and managed state storage is billed by session duration. A rough TCO formula:

Monthly cost ≈ (tokens generated × per-token rate) + (active sessions × storage rate × average session lifetime)

There's a free developer tier via Google AI Studio for experimentation. For production, a small business running a handful of agent sessions daily should budget primarily for tokens. Teams running thousands of long-lived sessions must watch storage-duration fees closely — that's the cost component critics say is under-documented, and they're right. Always validate exact figures against the live pricing page before committing to an architecture built around this.

Expert and Community Reactions to the Interactions API Launch

Developer community response

Early reaction across X, HackerNews, and GitHub centres on dramatic boilerplate reduction balanced against lock-in anxiety. The recurring theme: developers love shipping faster but distrust handing state to a single vendor. Both reactions are rational.

Analysis from TheGenAIGirl and other commentators

A Medium analysis titled 'Interactions API + ADK: A Closer Look' by #TheGenAIGirl was among the earliest detailed technical write-ups, highlighting the stateful multi-turn interaction model as the defining architectural feature. Read the official engineering framing from authors Ali Çevik (Group Product Manager, Google DeepMind) and Philipp Schmid (Developer Relations Engineer, Google DeepMind) in the GA announcement.

Concerns raised: state ownership, lock-in, and pricing transparency

Engineers on HackerNews flagged that handing session state to Google creates a new class of vendor lock-in harder to reverse than model lock-in — you can swap a model with a string change; you cannot trivially export months of accumulated agent memory. That's a fair critique, not paranoia. Pricing transparency for managed state at scale is a recurring criticism, and the docs don't currently make it easy to estimate costs before you're already committed.

Positive reception

The Apple Foundation Models integration drew strong positive sentiment from iOS and macOS developers who previously lacked a clean path to integrate Gemini into native Apple apps.

Developer community sentiment chart on Interactions API showing boilerplate reduction praise versus vendor lock-in concerns

The community's split verdict on the Interactions API: enthusiasm for reduced engineering effort, caution over the State Sovereignty Shift's lock-in implications.

What Comes Next: Roadmap, Open Questions, and Bold Predictions

Expected features based on current gaps

The most-requested missing capability is developer-controlled state export and portability — the ability to snapshot a session's state and replay it outside Google's infrastructure. That single feature would neutralise the lock-in concern for most teams. Gemini Omni is already flagged as 'soon' for expanded multimodal generation, which is the only concrete roadmap signal in the GA announcement.

How the Gemini 3 family will depend on this architecture

Because Gemini 3 Pro is built around Interactions API primitives, future capability releases — new modalities, longer context, improved tool use — will ship as Interactions API features first. If you want the newest capabilities the day they drop, staying on this API is the path.

Will OpenAI and Anthropic follow with full server-side state GA?

2026 H2


  **OpenAI accelerates Responses API agentic GA**
Enter fullscreen mode Exit fullscreen mode

With Google shipping a stable schema and Managed Agents, OpenAI faces pressure to move its Responses API stateful agent features to full GA within 12 months.

2027 H1


  **MCP and Interactions API converge**
Enter fullscreen mode Exit fullscreen mode

Google plausibly adopts MCP as the tool-definition standard within sessions to reduce developer fragmentation.

2027 H2


  **State portability becomes table stakes**
Enter fullscreen mode Exit fullscreen mode

Lock-in pressure forces providers to ship session export/import, partially defusing the State Sovereignty Shift's hardest objection.

End of 2027


  **Managed state becomes the default**
Enter fullscreen mode Exit fullscreen mode

Bold prediction grounded in current adoption velocity: more than 50% of new production agentic systems run on cloud-provider-managed state rather than developer-owned orchestration — the dominant architectural story of the next infrastructure cycle.

The most reversible lock-in is model lock-in (a string change). The least reversible is state lock-in. The Interactions API quietly moves you from the first category to the second — which is precisely why the State Sovereignty Shift deserves a line in your architecture review, not just your pricing spreadsheet. For more, see our breakdown of vendor lock-in in AI.

Coined Framework

The State Sovereignty Shift in practice

Every team adopting the Interactions API is implicitly choosing a side: trade orchestration complexity for infrastructure dependency, or retain control at the cost of engineering effort. There is no neutral choice — only an informed one. If you want help mapping that choice to reusable patterns, our agent template library is a practical starting point.

Frequently Asked Questions

What is the Interactions API and how is it different from the Gemini GenerateContent endpoint?

The Interactions API is Google's unified, stateful endpoint for both model inference and agent execution, announced GA in June 2026. The key difference from GenerateContent is server-side state: instead of passing a full message-history array on every call, you reference a session that Google maintains. You pass a model ID for inference, an agent ID for autonomous tasks, and set background=True for long-running work. GenerateContent is stateless REST; the Interactions API is a stateful RPC pattern. It also natively combines tools (web search, code execution, function calling, RAG) and supports multimodal input within one session, eliminating 200–400 lines of orchestration code per agent according to early adopters.

Is the Interactions API generally available or still in preview as of 2026?

It is generally available. Google announced GA in June 2026, confirming it as the primary API for Gemini models and agents. The public beta launched in December 2025. The GA release ships a stable schema — meaning breaking changes follow a versioned deprecation cycle — plus new capabilities including Managed Agents, background execution, and tool improvements, with Gemini Omni flagged as coming soon. All Google documentation now defaults to the Interactions API, and Google is working with partners to make it the default across third-party SDKs. The stable schema is the signal that enterprise teams can now build SLAs on top of it rather than treating it as experimental.

How does server-side state in the Interactions API work and what data does Google store?

Server-side state means Google's infrastructure stores conversation history, tool-call results, and agent memory tied to a session ID. On each turn you reference the session rather than resending context, and Google reconstructs it. This is the core of the State Sovereignty Shift: your most valuable production data now lives on Google's servers, not your own database. The data residency and SLA guarantees differ between the consumer tier (Google AI for Developers) and the enterprise tier (Vertex AI). For regulated workloads with strict residency requirements, use Vertex AI or keep state developer-owned via a framework like LangGraph. Always confirm current retention and residency terms in the official documentation before storing sensitive data.

What are Managed Agents in the Gemini API and do they replace LangGraph or CrewAI?

Managed Agents provision a remote Linux sandbox with one API call where an agent can reason, execute code, browse the web, and manage files. The Antigravity agent ships as the default; you can define custom agents with instructions, skills, and data sources. For common agentic patterns, they remove the need for external frameworks. However, they don't fully replace LangGraph, CrewAI, or AutoGen for every case. LangGraph remains superior for complex graph-based workflows with fine-grained state-transition control and human-in-the-loop steps. CrewAI and AutoGen offer model-agnostic portability. Choose Managed Agents when you want minimal code and accept Gemini lock-in; choose frameworks when you need control or portability.

How does the Interactions API compare to OpenAI's Responses API?

Both adopt server-side conversation state. OpenAI's Responses API introduced this in early 2025; Google's Interactions API is the structural response and reached GA in June 2026 with Managed Agents shipping a default cloud sandbox and explicit background execution via background=True. The current differentiator is GA maturity plus the bundled agent sandbox, which gives Google a production edge over some OpenAI agentic features still in beta. The trade-off is symmetric, though: both create state-plus-model lock-in. If portability matters most, neither managed option beats running an orchestration framework like LangGraph against models through compatible endpoints. Evaluate based on your tolerance for the State Sovereignty Shift.

What does the Interactions API cost and how is state storage priced?

Pricing combines the standard Gemini per-token model for generation with a separate charge for managed state storage duration. A useful TCO formula is: tokens generated × per-token rate, plus active sessions × storage rate × average session lifetime. Exact figures are on the official Gemini API pricing page. There is a free developer tier via Google AI Studio for experimentation. A key budgeting caution flagged by the community: idle long-lived sessions still accrue storage cost, and managed-state pricing at scale is considered under-documented. Teams running thousands of persistent sessions should monitor storage as a first-class cost metric and set explicit session lifetimes to avoid month-end surprises.

Can I use the Interactions API with the OpenAI Python SDK without rewriting my code?

Google maintains OpenAI-compatible endpoints, and per its compatibility documentation, OpenAI-compatible library users can switch with a roughly three-line code change (base URL, API key, model name). However, that compatibility path targets the inference layer — it does not automatically grant the full stateful Interactions API experience like Managed Agents, server-side sessions, and background execution. To use those advanced features you must adopt the latest Google AI SDK and update three core parameters; legacy SDKs targeting GenerateContent do not auto-migrate. In short: minimal change for basic inference, deliberate migration for the stateful agentic capabilities that make the Interactions API distinctive.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He writes from real implementation experience — covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile


This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.

Top comments (0)