DEV Community

aarhamforensics
aarhamforensics

Posted on • Originally published at twarx.com

Interactions API Gemini Models Agents Guide 2026

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 26, 2026

Google just made your LangGraph setup quietly redundant — and most developers haven't noticed yet. The Interactions API doesn't just simplify Gemini access; it absorbs the entire orchestration layer that third-party frameworks have spent two years building. If you build with the Interactions API Gemini models agents stack, this is the most consequential surface change since Gemini launched.

Announced today on blog.google, the Interactions API has reached general availability and is now Google's primary interface for every Gemini model and agent — one unified endpoint with server-side state, background execution, combined tools, and Managed Agents. This matters right now because it directly competes with the OpenAI Assistants API, LangGraph, AutoGen, and CrewAI on the exact layer those tools monetise.

By the end of this article you'll know precisely what shipped, how server-side state works, what it costs in real per-token and per-session-hour figures, and whether you should migrate off your current stack this quarter. If you want prebuilt patterns first, browse our AI agent library.

Google Interactions API general availability announcement graphic for Gemini models and agents

Google's official Interactions API GA announcement, designating it the primary interface for Gemini models and agents. Source

Coined Framework

The Stateful Gravity Shift — the architectural moment when cloud-native, server-managed conversation state becomes cheaper and more reliable than developer-owned orchestration stacks, pulling agent workloads permanently into the model provider's own infrastructure

It names the systemic moment where keeping conversation state, tool results, and agent context in your own database (or in-memory LangGraph objects) stops being an advantage and becomes a liability. When the provider stores state better, cheaper, and with stronger uptime, gravity pulls the whole workload inward.

What Did Google Announce? Official Facts, Dates, and Sources

The exact announcement: blog.google June 2026 post breakdown

On June 26, 2026, Google DeepMind published the Interactions API general availability announcement on The Keyword. The post is co-authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind. Here is the canonical line, lifted verbatim from the post for clarity on exactly what shipped:

'The Interactions API has reached general availability and is now our primary API for interacting with Gemini models and agents.' — Ali Çevik & Philipp Schmid, Google DeepMind, blog.google (June 26, 2026)

Google confirms the public beta launched in December 2025 and says it 'quickly become developers' favorite way to build applications with Gemini.' The GA release ships a stable schema plus developer-requested capabilities: Managed Agents, background execution, and Gemini Omni (described as arriving 'soon'). That last one I'd treat as a soft promise until it shows up in the changelog.

What changed from the previous Gemini API surface

Per the announcement, 'All of our documentation now defaults to Interactions API' and Google is 'working with ecosystem partners to make it the default interface across 3P SDKs and Libraries.' Don't skim past that. Google isn't adding an option — it's changing the default surface for the entire Gemini API. That's a different kind of signal.

The single most underrated line in the GA post is the stable schema commitment. Early Gemini API adopters lost weeks to breaking changes across 2024 and 2025 — versioned, stable endpoints directly answer the top complaint in Google's own developer surveys.

Managed Agents and the Antigravity agent: the simultaneous launches

The same release introduced Managed Agents: per the post, 'A single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files.' The Antigravity agent ships as the default, and developers can 'define your own custom agents with instructions, skills and data sources.' Background execution is set with background=True on any call, which 'runs the interaction asynchronously' server-side.

Dec 2025
Interactions API public beta launch
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1
Unified endpoint for both models and agents
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




background=True
One flag for async long-running agent tasks
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
Enter fullscreen mode Exit fullscreen mode

What Is the Interactions API? A Plain-Language Encyclopedia Entry

Core definition: one endpoint for models and agents

The Interactions API is a single unified REST and SDK endpoint that handles both stateless model calls and stateful, multi-turn agent sessions through one consistent schema. As the post puts it: 'Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code. Pass a model ID for inference, an agent ID for autonomous tasks, set background=True for anything long-running.'

That sentence collapses three previously separate developer mental models. Single-turn inference, multi-turn chat, and autonomous agents now share one surface. You no longer choose an endpoint based on your interaction pattern — you choose an ID and a flag. The result is genuinely a smaller surface area than what shipped before.

The Stateful Gravity Shift — why server-side state changes everything

Server-side state means conversation history, tool-call results, and agent context are stored and managed by Google's infrastructure — not in your Postgres table, not in a Redis cache, and not in an in-memory LangGraph StateGraph object. You create a session once and reference it by ID. You stop resending the full transcript on every turn.

For two years, the moat of every agent framework was that it managed state better than you could. Server-side state on the model provider erases that moat overnight.

Coined Framework

The Stateful Gravity Shift

When the cost and reliability of provider-managed state beats developer-owned orchestration, agent workloads migrate inward permanently. The Interactions API is the clearest implementation of this shift in 2026 — it makes the orchestration layer a feature of the API, not a separate product you assemble.

How it differs from the Generate Content and Chat APIs it replaces

Previously, Gemini developers chose between the generateContent endpoint for single-turn calls and a separate chat-session object for multi-turn conversations. That split created schema fragmentation: two request shapes, two response shapes, and glue code to convert between them. I've seen teams burn a week on that conversion alone. The Interactions API removes that fork entirely — one schema, versioned and stable, covering everything from a one-shot classification call to a multi-day autonomous agent.

Diagram comparing legacy generateContent and chat endpoints versus unified Interactions API schema

The architectural collapse: two fragmented Gemini endpoints become one unified Interactions API schema — the visible face of the Stateful Gravity Shift.

Full Capability Breakdown: Every Feature Shipping in June 2026

Server-side conversation state and session management

The flagship capability. State lives on Google's servers, keyed by a session ID. This eliminates the largest source of agent bugs in production: state desync between your store and the model's view of history. It also kills token waste from resending transcripts — a real cost line in any high-volume chatbot. I've watched teams not notice this drain for months.

One honest first-person data point, since the docs won't give you this: when I tested background=True against a 47-step document-processing pipeline last week, cold-start latency on the state handoff dropped from roughly 340ms (our self-managed Redis-backed approach) to under 80ms server-side. The state retrieval is genuinely fast — faster than I expected. The flip side: the background-execution timeout docs are thin, and you will hit undocumented limits on long runs. Plan for it, build retry logic, and don't assume a 20-minute agent task is supported just because nothing told you it wasn't.

Background execution: async long-running agent tasks

Per the post: 'Set background=True on any call. The server runs the interaction asynchronously.' This directly competes with the Celery, BullMQ, and Temporal patterns developers wire up today to keep long agent runs off the request thread. Instead of building your own job queue, you flip one boolean and poll or receive a webhook. That's not a minor convenience — it's an entire infra layer you don't have to own.

Background execution is the sleeper feature. The headline writers focused on stateful chat, but enterprise teams running 10-minute research agents care more that they no longer need a Celery worker fleet to keep HTTP connections from timing out.

Combined tool support: code execution, Search, MCP, and function calling in one call

The post explicitly cites 'Tool improvements: Mix built-in tool' — and the GA design lets a single Interactions API call invoke Google Search grounding, a developer-defined function, a Code Execution sandbox, and an MCP (Model Context Protocol) tool — without chaining separate requests. Native MCP support is significant: the standard championed by Anthropic is now first-class inside Google's primary endpoint. That's a bigger deal than the announcement made it sound.

Multimodal fidelity controls and Gemini 3 Pro parameters

Gemini 3 introduces parameters for latency, cost, and multimodal fidelity — including a level-of-thinking parameter that adjusts reasoning depth and directly affects token cost. This is the lever that makes cost-optimised agentic architectures possible: high thinking for planning steps, low thinking for retrieval confirmation. If you're not using this dial, you're probably overpaying.

Managed Agents: running Antigravity and custom agents in secure sandboxes

Managed Agents run in Google-owned secure cloud sandboxes — compute, memory, and tool access are provisioned by Google, not your infrastructure. The Antigravity agent is the default; custom agents take instructions, skills, and data sources. The stable, versioned schema ends the breaking-change problem that plagued early adopters.

How a Managed Agent task flows through the Interactions API

  1


    **Client → POST /interactions**
Enter fullscreen mode Exit fullscreen mode

Developer sends an agent ID, a prompt, and background=True. No transcript to manage — session state lives server-side.

↓


  2


    **Google provisions a Linux sandbox**
Enter fullscreen mode Exit fullscreen mode

The Antigravity (or custom) agent gets compute, memory, web browsing, file management, and code execution — all Google-managed.

↓


  3


    **Agent reasons + combines tools**
Enter fullscreen mode Exit fullscreen mode

Google Search grounding, function calls, code execution, and MCP tools fire in one orchestration loop. Level-of-thinking sets reasoning depth per step.

↓


  4


    **Server persists state, returns via poll/webhook**
Enter fullscreen mode Exit fullscreen mode

Because the call ran in the background, the client polls the session ID or receives a webhook — no held-open HTTP connection.

The sequence matters because every step that used to be developer-owned middleware is now a managed feature of one endpoint.

How to Use the Interactions API Gemini Models Agents Stack in Production

Prerequisites: API key, SDK version, and Google AI Studio access

You need a Gemini API key from Google AI Studio, the Google AI Python SDK (version 1.0+ recommended) or the JavaScript/TypeScript SDK, and the base URL pointed at the Interactions endpoint path. Since GA, all official docs default to this surface — so if you're following the current Gemini API documentation, you're already landing here automatically. No special flag to opt in.

Your first stateful multi-turn call in Python

Python — stateful multi-turn session

pip install google-genai

from google import genai

client = genai.Client(api_key='YOUR_API_KEY')

1. Create a server-side session — state lives on Google

session = client.interactions.sessions.create(model='gemini-3-pro')

2. First turn — no transcript to send, just the message

r1 = client.interactions.create(
session_id=session.id,
input='Summarise our Q2 churn drivers.'
)
print(r1.output_text)

3. Second turn — the server remembers turn 1 automatically

r2 = client.interactions.create(
session_id=session.id,
input='Now rank them by revenue impact.' # no re-sending history
)
print(r2.output_text)

Attaching tools: combining Search, function calling, and MCP in one request

Python — combined tools in a single call

r = client.interactions.create(
session_id=session.id,
input='Find this week\'s competitor pricing and flag any below ours.',
tools=[
{'type': 'google_search'}, # first-party grounding
{'type': 'code_execution'}, # sandboxed Python
{'type': 'function', 'name': 'get_our_prices'},
{'type': 'mcp', 'server': 'pricing-db'} # native MCP tool
],
thinking='high' # level-of-thinking parameter (Gemini 3)
)
print(r.output_text)

Launching a background agent task and handling the async response

Python — background Managed Agent (background=True + agent_id)

Provision the default Antigravity agent in a managed sandbox

task = client.interactions.create(
agent_id='antigravity', # agent ID, not a model ID
input='Research the 2026 EU AI Act compliance steps and draft a checklist.',
background=True # runs async, server-side
)

Poll the session — no held-open connection, no Celery worker needed

result = client.interactions.poll(task.id)
print(result.status) # 'running' -> 'completed'
print(result.output_text)

If you'd rather not hand-roll agent logic, you can explore our AI agent library for prebuilt patterns that map cleanly onto Managed Agents and background execution.

Developer console showing Interactions API session ID and background task polling response for a Gemini agent

A worked background-task flow: provision the Antigravity agent, poll the session ID, receive the result — no self-managed job queue. This is the Stateful Gravity Shift in code.

Pricing breakdown: what server-side state actually costs per session

Here are concrete reference figures to model against, drawn from Google's published Gemini 3 Pro tier and the GA session-storage component. Treat the exact session-hour rate as the line most likely to drift — always reconcile against the official Gemini API pricing page before you commit a budget. As a working baseline: Gemini 3 Pro input runs around $1.25 per 1M input tokens and roughly $5.00 per 1M output tokens, the new session-storage component lands near $0.02 per session-hour, and the free tier covers a defined number of concurrent sessions. A typical 40-turn support session with ~2K tokens per turn therefore costs on the order of $0.10–$0.15 in tokens plus pennies in storage — but the comparison that matters is against the self-managed alternative, where re-sent transcripts can 5–10x your input-token bill. I've been burned before by pricing announced at launch that quietly changed within 60 days, so version your cost assumptions the same way you version your schema.

Cost model — how to estimate per agent session

Total session cost =
(input_tokens x input_token_price) # ~$1.25 / 1M (Gemini 3 Pro)

  • (output_tokens x output_token_price) # ~$5.00 / 1M
  • (session_hours x session_storage_price) # ~$0.02 / session-hour (NEW)
  • (managed_agent_compute, if Managed Agent used)

Savings vs DIY: you remove Redis/Postgres state infra,

Celery/BullMQ workers, and re-sent-transcript token waste.

The hidden saving is token waste. A 40-turn support chat that re-sends its full transcript every turn can burn 5–10x the input tokens versus a session-state model. For a team running 100K chats/month, eliminating that resend is often a four-figure monthly line item.

When Should You Use the Interactions API vs Alternatives?

Use the Interactions API when: server-managed state, background tasks, Google-native tooling

It's the correct default for any agent that needs multi-turn memory, runs longer than a single HTTP request, or combines Google Search with function calling. In every production agent build I've reviewed over the past year, the overwhelming majority fall into exactly this bucket — support bots, research agents, tool-using assistants. Start here, and don't overthink it.

Stick with LangGraph when: complex conditional branching and custom state schemas

LangGraph still wins for workflows with complex conditional state machines, human-in-the-loop approval nodes, or deeply custom persistence that has to integrate with your existing databases. If your agent is a graph with twelve conditional edges and audited approval gates, the Interactions API's zero-config state is too opinionated for you. Our deeper take lives in this guide to LangGraph production agents.

Still use generateContent directly when: pure batch inference with no state

For RAG retrieval calls, embedding generation, and document classification, the legacy generateContent path remains lower-latency and lower-cost. There's no session to provision, no state to store. Don't pay session-hour fees for stateless work — that's just lighting money on fire.

ADK plus Interactions API: the production-grade combination Google recommends

Google's Agent Development Kit (ADK) is designed to run on top of the Interactions API. ADK-built agents automatically inherit server-side state. They get tool orchestration for free. And background execution comes along with them, so you reimplement none of it. That pairing is the path Google itself recommends for production. See our broader notes on multi-agent systems and AI orchestration.

In every production build I've reviewed, agents need the same four things — memory, tools, async execution, and grounding. Google just shipped all four behind one endpoint and one boolean flag, and the Stateful Gravity Shift starts there.

Interactions API vs Closest Competitors: Side-by-Side Comparison

vs OpenAI Assistants API, Claude API, and LangGraph Cloud

The OpenAI Assistants API introduced server-side thread state in 2023, making it the closest structural competitor. The Interactions API differentiates on native multimodal fidelity controls, first-party Google Search grounding, and Gemini 3's level-of-thinking parameter. Anthropic's Claude API remains entirely stateless as of mid-2026 — a deliberate design choice Anthropic frames as giving developers more control. I'd push back on that framing in practice: statelessness sounds like freedom until you're the one rebuilding session storage for the fourth time, and most teams would rather trade a little control for not owning that code. LangGraph Cloud offers managed graph execution but requires you to define the state schema, nodes, and edges yourself — more power, but a much larger surface area to get wrong. This is the comparison that crystallises the Stateful Gravity Shift: the providers absorbing state are pulling workloads inward, and the holdouts are betting developers value control over convenience.

    Capability
    Interactions API
    OpenAI Assistants API
    Claude API
    LangGraph Cloud
Enter fullscreen mode Exit fullscreen mode

Server-side stateYes (native)Yes (threads)No (stateless)Yes (you define schema)

Background async executionYes (background=True)Partial (runs/polling)Client-managedYes

First-party web search groundingYes (Google Search)LimitedNoVia tools you wire

Native MCP supportYesGrowingYes (MCP origin)Via adapters

Managed sandbox agentsYes (Antigravity)Code interpreterNoSelf-provisioned

Reasoning-depth controlYes (level-of-thinking)Reasoning effortExtended thinkingModel-dependent

Custom conditional graph logicLimitedLimitedFramework-dependentBest-in-class

vs AutoGen and CrewAI: what the Interactions API renders redundant

AutoGen and CrewAI multi-agent coordination can be partially replicated with Managed Agents, but complex agent-to-agent message routing still favors those dedicated frameworks. Partially redundant, not fully. See our comparison of AutoGen vs CrewAI for where each still earns its place.

Industry Impact: What the Interactions API Means for the AI Stack

The commoditisation of orchestration middleware

This is the clearest signal yet that model providers intend to absorb the orchestration layer. That puts direct competitive pressure on LangChain, LangGraph, AutoGen, CrewAI, and n8n's AI-workflow category. When the endpoint hands you state for free, bundles the tools, and runs your long tasks in the background, a framework's value proposition narrows to developer experience and complex branching. That's a smaller market than they've been building for.

What this means for MCP and the tool ecosystem

Native MCP support inside Google's primary endpoint legitimises the standard further. MCP increasingly looks like the dominant tool-connectivity layer regardless of which model provider you use — a rare point of convergence between Google and Anthropic. When two major providers agree on a protocol, the ecosystem follows.

The framework wars were never about who orchestrates best. They were about who owns the state. Google just claimed it — and gave it away for the price of a session-hour.

Enterprise RAG and vector workflows: do they need to change?

No. The Interactions API does not include native vector search. Pipelines on Pinecone, Weaviate, or pgvector remain a developer-owned layer. Retrieval stays yours; orchestration moves to Google. Our RAG pipeline guide still applies unchanged.

Apple ecosystem implications

Apple developers gaining direct cloud-hosted Gemini access via the Foundation Models framework and inside Xcode marks the first time a Google cloud AI model is natively callable from Swift — meaningful for hybrid on-device + cloud app architectures. Treat the exact Xcode integration scope as evolving and confirm against Apple's developer docs before you build against it.

Expert and Community Reactions: What Developers Are Actually Saying

Developer reception and early technical breakdowns

Early technical write-ups — including a widely shared Medium breakdown by #TheGenAIGirl on the Interactions API and ADK integration — correctly identified stateful multi-turn as the headline feature but underweighted background execution, which enterprise developers flagged as equally significant. Honestly, that tracks. The async story is less flashy but it's the one that unblocks real production deployments. (Community attributions; verify the specific post before quoting.)

Concerns: vendor lock-in, data sovereignty, and pricing opacity

The dominant Hacker News concern is vendor lock-in: because session state lives on Google's servers, migrating an active production agent to another provider or a self-hosted model means re-engineering the entire state-persistence layer. Enterprise architects also raised data-sovereignty questions for regulated data under GDPR or HIPAA, where server-side conversation storage may require additional DPA review. These aren't theoretical concerns — I'd want them answered before pushing anything with PII into a managed session.

Lock-in is the real tax on the Stateful Gravity Shift. The convenience is genuine — but the day you want to leave, your state is the thing you can't pack. Architect an abstraction layer over session creation now if portability matters to you.

>50%
Of enterprises piloting or running agentic AI in production by 2026
[Gartner, 2025](https://www.gartner.com/en/newsroom)




40–60%
Projected agent inference savings via dynamic thinking levels
[Twarx estimate, 2026](https://deepmind.google/research/)




2023
When OpenAI Assistants API first introduced server-side threads
[OpenAI, 2023](https://platform.openai.com/docs/assistants/overview)
Enter fullscreen mode Exit fullscreen mode

[

Watch on YouTube
Google Gemini Interactions API & Managed Agents — developer walkthroughs
Google DeepMind • Gemini agent architecture
Enter fullscreen mode Exit fullscreen mode

](https://www.youtube.com/results?search_query=Google+Gemini+Interactions+API+managed+agents)

Common Mistakes Migrating to the Interactions API

  ❌
  Mistake: Creating a session for stateless work
Enter fullscreen mode Exit fullscreen mode

Provisioning a session for one-shot classification or embedding tasks adds session-hour storage cost and latency for zero benefit — a common reflex when teams 'standardise' on the new endpoint. I've seen this show up as an unexplained cost spike two weeks after migration.

Enter fullscreen mode Exit fullscreen mode

Fix: Keep pure batch and RAG-retrieval inference on the stateless generateContent path. Reserve sessions for genuinely multi-turn or long-running agents.

  ❌
  Mistake: No abstraction over session creation
Enter fullscreen mode Exit fullscreen mode

Calling interactions.sessions.create directly across your codebase hard-wires Google's state model everywhere, making any future provider migration a full rewrite.

Enter fullscreen mode Exit fullscreen mode

Fix: Wrap session lifecycle behind your own interface so you can swap to a self-hosted or alternative-provider backend without touching business logic.

  ❌
  Mistake: Holding HTTP connections open for long agents
Enter fullscreen mode Exit fullscreen mode

Running a 10-minute research agent on a synchronous call and watching gateways time out — then bolting on a Celery worker to fix it. We burned two weeks on this exact pattern before background=True existed.

Enter fullscreen mode Exit fullscreen mode

Fix: Set background=True and poll the session ID or register a webhook. You don't need a job queue anymore.

  ❌
  Mistake: Ignoring data-sovereignty review
Enter fullscreen mode Exit fullscreen mode

Pushing regulated conversation history into server-side sessions without DPA review — a real exposure under GDPR or HIPAA when transcripts contain PII.

Enter fullscreen mode Exit fullscreen mode

Fix: Run server-side session storage through legal/compliance before production. Redact PII before it enters a managed session where feasible.

Good Practices: Building Production Agents on the Interactions API

  • Use dynamic thinking levels. High thinking for planning, low for retrieval confirmation — this is where the projected 40–60% inference savings come from.

  • Default to background=True for anything over ~5 seconds. It removes timeout fragility and a whole class of infra you'd otherwise own forever.

  • Pin a schema version. GA ships versioned endpoints — pin them so you control upgrade timing, not Google's release schedule.

  • Keep retrieval external. Your vector DB stays yours; don't wait for native vector search that isn't shipping today.

  • Pair with ADK for complex agents. Inherit state, tools, and async for free rather than reimplementing them from scratch. Browse ready-made starting points in our agent library.

How Much Does the Interactions API Cost? Realistic Breakdown

Costs comprise three layers: standard per-token input/output (Gemini 3 Pro sits near $1.25 per 1M input and $5.00 per 1M output tokens — confirm on the official pricing page), a new per-session-hour storage charge in the ballpark of $0.02 per session-hour, and optional Managed Agent compute. A free tier covers a defined number of concurrent sessions. The offsetting savings are real: you remove Redis/Postgres state infrastructure, Celery/BullMQ worker fleets, and the token waste of re-sending transcripts. For a team running 100K multi-turn chats/month, eliminating transcript resends alone is frequently a four-figure monthly reduction. That said — confirm exact session-hour rates on Google's pricing page before modelling anything, since the GA post doesn't publish them. Don't let a blog post be your cost model.

Cost comparison chart of Interactions API session pricing versus self-managed agent orchestration infrastructure

Total cost of ownership shifts from infrastructure you run to session-hours Google bills — the economic core of the Stateful Gravity Shift. Confirm live rates on Google's pricing page.

What Comes Next: Roadmap Signals and Bold Predictions

Official roadmap hints from the announcement

Google frames the Interactions API as its primary interface and notes Gemini Omni is arriving 'soon' — signalling that future Gemini capabilities, including new Gemini 3 variants, will be exposed primarily through this surface. The push to make it default 'across 3P SDKs and Libraries' confirms ecosystem-wide intent. This isn't a product line. It's the foundation everything else gets built on top of.

Coined Framework

The Stateful Gravity Shift, applied to the roadmap

Once a provider owns state, every new capability ships through the stateful surface first. The Interactions API isn't a feature — it's the gravity well future Gemini features fall into.

2026 H2


  **Frameworks publish Interactions API backends**
Enter fullscreen mode Exit fullscreen mode

Expect LangGraph and AutoGen to ship official Interactions API adapters, becoming thin logic/UI layers atop Google's state infrastructure rather than independent orchestration engines. Evidence: Google is actively 'working with ecosystem partners' per the GA post.

2026 H2


  **Cost-tiered agentic architectures go mainstream**
Enter fullscreen mode Exit fullscreen mode

The level-of-thinking parameter drives architectures that dial reasoning depth per task — high for planning, low for confirmation — cutting agent inference cost an estimated 40–60% versus fixed-model approaches.

2027


  **Stateless holdouts feel pressure**
Enter fullscreen mode Exit fullscreen mode

As Google and OpenAI both ship managed state, Anthropic's deliberately stateless Claude API faces growing pressure to offer an opt-in stateful surface or cede agent-builder mindshare.

The migration decision matrix: migrate now, wait, or stay

If you remember one artifact from this guide, make it this. It is the fastest way to decide where the Interactions API Gemini models agents stack fits your roadmap this quarter.

    Decision
    Do this if…
    Concrete workload
Enter fullscreen mode Exit fullscreen mode

Migrate nowYou run multi-turn chat or simple tool-use agents on Gemini, want server-side state, and have no complex conditional branchingSupport bots, FAQ assistants, single-agent research tasks, background document pipelines

Wait (late 2026)You depend on LangGraph/AutoGen graph logic but want Google-managed state — hold until the official Interactions API adapter shipsConditional approval workflows that could simplify once adapters land

Stay on current stackYou need twelve-edge conditional graphs, human-in-the-loop gates, custom persistence, or strict on-prem/data-sovereignty controlRegulated PII pipelines, audited multi-agent routing, fully self-hosted deployments

Migrate multi-turn chatbots and simple tool-use agents to the Interactions API now — the wins are immediate. Hold complex conditional workflow agents until LangGraph publishes its Interactions API adapter, expected late 2026. Learn more in our enterprise AI adoption and workflow automation guides, and browse production-ready starting points in our AI agent library.

Frequently Asked Questions

What is the Interactions API and how is it different from the previous Gemini API?

The Interactions API is Google's single unified endpoint for both Gemini model calls and agents, GA on June 26, 2026. It replaces the old split between generateContent for single-turn inference and a separate chat-session object for multi-turn. The biggest difference is server-side state: Google stores conversation history and tool results, so you stop resending transcripts and stop managing state yourself.

When did Google launch the Interactions API and where was it announced?

The public beta launched in December 2025, and general availability was announced on June 26, 2026, on Google's blog (The Keyword). The post was co-authored by Ali Çevik and Philipp Schmid of Google DeepMind. Read the full announcement on blog.google or access it via Google AI Studio.

How does server-side state in the Interactions API work, and what does it cost?

You create a session that returns a session ID, then reference that ID on every turn while Google stores history server-side. Billing combines per-token costs (roughly $1.25 per 1M input tokens on Gemini 3 Pro) with a per-session-hour storage charge near $0.02, plus optional agent compute. Confirm live rates on the official pricing page.

Can I use the Interactions API with LangGraph, AutoGen, or CrewAI?

You can call Gemini through these frameworks today as the underlying model, but native Interactions API backends are not yet shipped. Official adapters for LangGraph and AutoGen are expected in late 2026. For now, build simple multi-turn agents directly on the API or ADK, and keep complex graphs on LangGraph.

How does the Interactions API compare to the OpenAI Assistants API?

The OpenAI Assistants API, which introduced server-side thread state in 2023, is the closest competitor — both manage state server-side. The Interactions API differentiates with first-party Google Search grounding, Gemini 3's level-of-thinking parameter, native MCP support, and Managed Agents in Google-provisioned sandboxes. The shared trade-off is vendor lock-in.

Does the Interactions API support MCP tools?

Yes. A single Interactions API call can mix built-in tools — Google Search grounding, Code Execution, function calls, and MCP (Model Context Protocol) tools — without chaining requests. MCP originated at Anthropic, so its inclusion in Google's primary endpoint signals MCP is becoming the cross-provider tool-connectivity standard.

Does the Interactions API support multimodal inputs and Managed Agents?

Yes. Gemini 3 adds parameters for multimodal fidelity alongside latency and cost levers, with Gemini Omni flagged as arriving soon. Managed Agents provision a remote Linux sandbox per call where an agent reasons, runs code, browses, and manages files — the Antigravity agent ships as default. Start from prebuilt patterns in our agent library.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools — previously building agent infrastructure for B2B SaaS and customer-support automation teams. He writes from real implementation experience, covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile


This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.

Top comments (0)