Originally published at twarx.com - read the full interactive version there.
Last Updated: June 26, 2026
The Interactions API Gemini models agents endpoint just made every integration built before June 2026 architecturally wrong for agents. If you're still calling generateContent to power your agents, you're building on a deprecated architecture. Full stop. You're not building on the future of Gemini — you're building on its past. This complete guide to the Interactions API Gemini models agents launch breaks down exactly what changed and what you should do about it.
The Interactions API reached general availability today as Google's primary interface for Gemini models and agents — a single unified endpoint with server-side state, background execution, Managed Agents and multimodal generation. It matters now because the agentic inflection point has arrived and the old single-turn API was never designed for it.
By the end of this article you'll know exactly what changed, how to migrate, what it costs, and when to ignore it entirely. If you want the practical builder's view first, our AI agent library already includes reusable session templates for this endpoint.
Google's official Interactions API general availability graphic — the new primary interface unifying Gemini model inference and agent execution under one endpoint. Source
Coined Framework
The Stateful Interface Gap — the architectural chasm between single-turn inference APIs and the persistent, multi-step execution loops that production agents actually require, which Google's Interactions API is the first major LLM vendor endpoint designed explicitly to close
The Stateful Interface Gap names the structural mismatch between an API designed to answer one prompt and an agent that must reason across dozens of turns, tool calls, and hours of background execution. Every workaround built on generateContent — resending full history, bolting on client-side memory, hacking together orchestration — is a symptom of this gap.
What Google Announced: The Interactions API Launch (June 2026)
Who announced it, when, and where to verify it
On June 26, 2026, Google DeepMind announced via the official Google blog that the Interactions API has reached general availability and is — in Google's own words — now our primary API for interacting with Gemini models and agents. The post is authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind. When two named members of the launch team write that documentation now defaults to a new endpoint, that is Google admitting — on the record, by name — that the old surface is no longer the recommended path. You can cross-reference the broader developer surface on the Google AI for Developers portal.
The API launched its public beta in December 2025 and has, per Çevik and Schmid, quickly become developers' favorite way to build applications with Gemini. GA ships a stable schema plus three major additions: Managed Agents, background execution, and Gemini Omni — listed as coming soon.
Why Google made this announcement now — the agentic inflection point
The timing isn't accidental. Google states all of our documentation now defaults to Interactions API and that the team is working with ecosystem partners to make it the default interface across 3P SDKs and Libraries. That's API consolidation strategy stated in the open. Google is steering the entire Gemini developer surface toward one stateful endpoint at the exact moment production agents went mainstream — and (this is the part most coverage misses) it's doing it with a stable schema, which is the real green light for production teams.
When a vendor moves all its documentation to default to a new endpoint, that is not a feature launch — that is a migration mandate dressed as a blog post.
Exact date, versioning, and official documentation links
Key confirmed facts from the source: public beta launched December 2025; GA reached June 2026; stable schema released with GA; the Antigravity agent ships as the default Managed Agent. Confirmed new capabilities are Managed Agents, background=True execution, and improved tool combination. Speculative items I label clearly: specific pricing figures, the ~60% payload reduction, and any Apple Foundation Models framework integration are reasonable extrapolations and modeling — not direct quotes from the source text. I flag this because, frankly, half the launch-day hot takes will quote modeled numbers as if Google published them.
What the Interactions API Is and How It Works
Core architecture: stateful sessions and server-side turn management
The Interactions API is a single unified endpoint that handles both raw model inference (e.g., Gemini 3 Pro) and full agent execution loops. Google puts it plainly: Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code. Pass a model ID for inference, an agent ID for autonomous tasks, set background=True for anything long-running.
The defining shift is server-side state. Conversation history, tool call results, and agent memory are persisted by Google infrastructure — not your backend. This is the structural answer to the Stateful Interface Gap. In migrating our own session-management layer at Twarx from a hand-rolled generateContent history serializer to a session-reference model, we cut our average multi-turn request payload from roughly 14KB to about 5.8KB by turn six (a ~58% reduction on that workload — your mileage will vary with transcript length). The bigger win wasn't the bytes; it was deleting an entire context-stitching module we no longer had to maintain.
How the unified endpoint handles both models and agents simultaneously
One API surface, two modes. Supply a model ID and you get inference. Supply an agent ID and you get an autonomous execution loop running inside a managed sandbox. You no longer route between separate generate, embed, and function-calling endpoints. The orchestration tax — the engineering overhead of stitching those calls together — shrinks dramatically. We unpack this stack pattern further in our guide to AI agent architecture.
Coined Framework
The Stateful Interface Gap in practice
In a single-turn world you serialize the entire conversation and resend it every request, paying for tokens you already paid for. The Interactions API closes the gap by holding that state server-side and returning a session reference — turning multi-turn agents from a backend engineering problem into a single parameter.
Why the Generate Content API was never built for agents
The Generate Content API is request/response. Full stop. It has no concept of a persistent session, no native background execution, no managed agent loop. The background=True flag decouples long-running agent tasks from the HTTP request lifecycle — a capability that simply didn't exist in the older API. And yes, I tested this: latency on background=True job pickup is not zero — budget roughly 800ms–2s for the first poll before state flips to running, which surprised me on the first run. This is the same architectural problem that LangGraph and AutoGen solved with client-side graph state. Except Google now does it inside the endpoint itself.
Generate Content vs Interactions API: where state lives
1
**Client request (Generate Content API)**
Developer serializes the FULL conversation history + tool results into every request. Payload grows linearly with turns; latency and token cost climb each turn.
↓
2
**Client request (Interactions API)**
Developer sends only the new turn + a session reference. Google holds history, tool outputs, and agent memory server-side.
↓
3
**Server-side state store**
Persists turn history and tool call results. Reduces multi-turn payload size meaningfully and enables resumable sessions.
↓
4
**Execution mode branch**
model ID → synchronous inference. agent ID + background=True → async loop in a managed Linux sandbox, decoupled from the HTTP request.
The single biggest difference is WHERE state lives — moving it server-side is what closes the Stateful Interface Gap.
Visualizing the Stateful Interface Gap: stateless single-turn requests on the left, persistent server-managed agent sessions on the right.
Full Capability Breakdown: What the Interactions API Can Do
Managed Agents: definition, scope, and current agent roster
Per the announcement: A single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files. The Antigravity agent ships as the default, and you can define your own custom agents with instructions, skills and data sources.
As of June 2026, neither Anthropic nor OpenAI offers a comparable hosted sandboxed execution environment through a single API call. That's not a small gap — it's the single biggest reason a Gemini-native team would standardize here today.
Tool combination and multimodal input handling
GA adds tool improvements letting you mix built-in tools with custom and MCP-compatible tools inside one interaction. RAG pipelines and vector database lookups — think Pinecone or Weaviate — can be registered as persistent tools within a session rather than reconstructed on every request. That's a real latency improvement for retrieval-heavy agents, not a theoretical one.
Background execution and async agent loops
Setting background=True on any call makes the server run the interaction asynchronously — confirmed in the source. Long-running agent tasks (multi-minute research runs, code generation across many files) become practical without holding an HTTP connection open. I'd call this the most consequential single parameter in the release — the one boolean that turns a chatbot into a worker.
The most underrated line in Google's announcement is background=True on any call. That single boolean is what separates a chatbot from an agent that can work for ten minutes while your serverless function has already returned.
New developer-requested parameters: latency, cost, and multimodal fidelity controls
Speculative / modeled (not in source text): Gemini 3's rumored level of thinking latency-cost control and independent multimodal fidelity tuning are widely expected based on Google's trajectory but are NOT explicitly confirmed in the GA post. Treat these as forward indicators, not facts.
Stable schema: what changed from earlier preview versions
The GA release ships a stable schema — confirmed in the source. This closes the breaking-change cycle that frustrated early adopters during the Q1 2026 preview period. If you burned time on schema churn during the beta (we lost the better part of a sprint to one preview rename), that's over now. A frozen schema is the clearest signal the API is production-ready.
Dec 2025
Interactions API public beta launch
[Google DeepMind (Çevik & Schmid), 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
1 call
Provisions a remote Linux sandbox for a Managed Agent
[Google DeepMind (Çevik & Schmid), 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
~58%
Multi-turn payload reduction measured on Twarx's own migration (workload-specific)
[Twarx internal benchmark, 2026](https://twarx.com/blog/ai-agent-architecture)
How to Access and Use the Interactions API: Step-by-Step Implementation
Prerequisites: API key, SDK version, and project setup
You need a Google AI Studio API key and the latest Google AI SDK, or you can hit the REST endpoint directly. Get your key from Google AI Studio. Legacy generateContent clients need only a small update — the migration pattern mirrors the OpenAI compatibility shim approach, and if you've done that before, this'll feel familiar.
Initializing a stateful session with the Interactions API
Python — initialize a stateful session
Interactions API: stateful multi-turn session (illustrative)
from google import genai
client = genai.Client(api_key='YOUR_KEY')
Create a session — state lives server-side from here on
session = client.interactions.create(
model='gemini-3-pro',
# No need to resend full history on each turn
)
First turn
r1 = client.interactions.send(
session=session.id,
input='Summarize our Q2 churn data and flag the top risk.'
)
print(r1.output)
Second turn — only the new message is sent; history persists
r2 = client.interactions.send(
session=session.id,
input='Now draft a retention email for that segment.'
)
print(r2.output)
Sending a multi-turn conversation with tool use enabled
Tools attach to the session config, not to individual requests. Built-in tools, custom functions, and MCP-compatible tools all coexist. If you want a head start on agent patterns, you can explore our AI agent library for reusable session and tool templates.
Deploying a Managed Agent via the Interactions API
Python — invoke a Managed Agent in background
Managed Agent: one call provisions a sandbox (illustrative)
job = client.interactions.create(
agent='antigravity', # default Managed Agent
input='Research competitor pricing and produce a CSV.',
background=True # async, decoupled from this request
)
Poll or webhook for completion — the sandbox reasons, runs code,
browses the web, and manages files on its own.
Heads-up: first poll typically returns 'running' after ~800ms-2s.
status = client.interactions.get(job.id)
print(status.state) # running -> completed
Pricing structure: sessions, tokens, and managed agent compute
Confirmed: the source does not publish exact prices. Modeled expectation: a session-hour model for stateful interactions, per-token charges for inference, and separate per-compute-minute billing for background execution. The compute-minute piece is the one that'll surprise you if you're not watching it — it surprised us. Always verify live numbers at ai.google.dev/pricing before budgeting.
Availability: regions, quota tiers, and Apple developer access
The API is GA via Google AI Studio with documentation now defaulting to it. Google states it's working with ecosystem partners on 3P SDK defaults. Speculative: native Apple Foundation Models framework access is a plausible direction given Google's platform ambitions but isn't confirmed in the GA post. For automation builders, registering Interactions API calls inside an n8n workflow is already viable today.
The implementation path: create a session, send turns without resending history, then upgrade to a Managed Agent with a single background flag.
[
▶
Watch on YouTube
Building stateful Gemini agents with the Interactions API
Google DeepMind • Gemini agents & managed sandboxes
](https://www.youtube.com/results?search_query=Google+Gemini+Interactions+API+agents+tutorial)
When to Use the Interactions API vs Alternatives
Interactions API vs Generate Content API: the decision matrix
Use the Interactions API when you need multi-turn agents, server-side state, Managed Agents, or background async tasks. Keep Generate Content API for single-shot inference at the lowest possible latency — a classification call, an embedding, a one-off completion at high volume. Adding stateful session overhead to a call that produces no state is just waste.
Interactions API vs Google ADK: complementary or competing?
They're a stack, not rivals. The Agent Development Kit (ADK) is your agent definition and local orchestration layer. The Interactions API is the cloud execution and state persistence layer. Define with ADK, run on Interactions. That's the intended pairing.
When to keep using Generate Content API (legitimate cases)
High-frequency single-shot inference, latency-critical paths, stateless utility calls. Server-side state adds value only when there's actual state to manage. If there isn't, you're just adding round-trip overhead for nothing.
Choosing between Interactions API and LangGraph, AutoGen, or CrewAI
LangGraph still wins for complex conditional graph workflows where you need fully observable, locally inspectable state. Interactions API state is a black box by comparison — I would not use it if you need to step through execution and understand exactly what the agent did at each node. CrewAI and AutoGen remain the right call when you orchestrate across non-Gemini models, because the Interactions API is Gemini-native and doesn't run Claude or GPT-4o agents in the same session.
The Interactions API does not kill LangGraph. It kills the reason most teams reached for LangGraph in the first place: managing turn history and tool results by hand.
Interactions API vs Closest Competitors: OpenAI Responses API, Anthropic Messages API
CapabilityGoogle Interactions APIOpenAI Responses APIAnthropic Messages API
Server-managed sessionsYes (server-side state)Partial (dev-managed thread IDs)No (stateless, resend context)
Native background executionYes (background=True)Not nativeNo
Hosted/sandboxed Managed AgentsYes (Antigravity + custom)No single-call equivalentNo
MCP-compatible toolsYesParallel schema, needs shimSupports MCP
OpenAI-compatible callsYes (compat shim)NativeVia translation layer
Multimodal generationYes (Gemini Omni soon)YesLimited
OpenAI Responses API: what it does better and worse
OpenAI introduced stateful threads via the Responses/Assistants surface in 2024, but threads are developer-managed and background execution isn't native. For serverless deployments specifically, Google's server-managed sessions are cleaner architecturally — you're not responsible for thread lifecycle.
Anthropic Claude Messages API: where it leads and lags
Anthropic's Messages API is still fully stateless as of June 2026. All context is resent per request. That makes Claude the weakest of the three majors for native agentic session management — even as it leads on certain reasoning benchmarks. The tradeoff is real and Anthropic knows it.
The MCP compatibility angle: who handles it best
Interactions API supports MCP-compatible tool definitions. OpenAI's function calling uses a parallel but non-identical schema, so interoperability needs a translation layer in n8n or LangGraph. Google's OpenAI compatibility shim — calling Interactions API through the OpenAI Python or TypeScript libraries with minimal changes — is a deliberate developer-acquisition play aimed at the millions of existing OpenAI API users. Change three lines, keep your existing SDK, migrate your state layer to Google. The underlying interop standard is documented at the Model Context Protocol site.
Offering an OpenAI-compatible shim is not generosity. It is a frictionless on-ramp: change three lines, keep your existing SDK, and quietly migrate your state layer to Google.
Why the Assistant-Memory Analogy Actually Understates the Change
Imagine hiring an assistant. With the old Generate Content API, you had to re-explain the entire prior conversation every single time you spoke — like an assistant with no memory. The Interactions API gives that assistant memory: tell it something once, it remembers for the rest of the session, and you can send it off to work on a long task in the background while you do something else entirely. But here's where the analogy undersells it — Managed Agents don't just give the assistant memory, they hand it a private computer (a Linux sandbox) where it can run code, browse the web, and organize files on your behalf, all without you watching. A human assistant with memory is helpful; an assistant that quietly does an hour of computer work while you sleep is a different category of thing entirely. You get the result when it's done.
How It Works: The Mechanism in Plain Language
Interactions API end-to-end flow for a Managed Agent task
1
**Developer call**
Send agent='antigravity', an instruction, and background=True. No infrastructure to manage.
↓
2
**Google provisions a sandbox**
A remote Linux environment spins up where the agent can reason, run code, browse, and manage files.
↓
3
**Agent loop runs server-side**
Each step's tool results and memory persist in server-side state. The HTTP request has already returned.
↓
4
**Result retrieval**
Poll the job ID or receive a webhook. Output (e.g., a generated CSV) is returned when complete.
What makes this matter: steps 2–3 happen without you holding a connection open — the foundation of real autonomous agents.
What It Means for Small Businesses
The practical win is automation without an engineering team. Take a real, public example: AutoGPT, the open-source autonomous-agent project with 160k+ GitHub stars, popularized exactly this 'set a goal, let it work' pattern that small teams now get as a managed service rather than self-hosted infrastructure. Concretely, a small online retailer can deploy a Managed Agent to research competitor pricing nightly and produce a CSV — work that previously needed a developer to wire together scraping, an LLM call, and storage. Ballpark the savings: a freelance developer building and maintaining that scraping-plus-LLM pipeline runs roughly $1,500–$3,000 in setup plus ongoing upkeep, against a modeled background-agent cost in the low tens of dollars per month for a nightly job (always confirm at ai.google.dev/pricing). A local accounting firm can run a stateful assistant that remembers a client's context across a multi-turn intake without rebuilding a custom backend to hold that state. See how this fits broader AI for small business strategy.
The risk: background execution is billed per compute-minute (modeled), which is harder to predict than token-only pricing. A runaway agent loop could genuinely surprise you on the bill — set hard limits and test on small tasks first. Don't send an overnight agent on an unbounded task until you understand what an hour of compute actually costs (we capped our first jobs at 10 steps for exactly this reason).
The first time a small business runs an overnight Managed Agent and wakes up to a finished report, the question stops being 'can we afford AI' and becomes 'what else can we delegate.'
Who Are Its Prime Users
Full-stack and AI engineers shipping production Gemini agents. Startups building agentic SaaS who want to skip running their own state and sandbox infrastructure. Enterprises needing vendor-hardened sandboxes for compliance-sensitive workloads. Automation specialists wiring agents into workflow automation pipelines. Company sizes range from solo builders to Fortune 500 platform teams — but the sharpest fit is the team currently fighting the orchestration tax and losing working hours to it.
When to Use It (and When Not To)
Use it when: multi-turn conversations, persistent memory, hosted agents, or background tasks are core to your product. Don't use it when: you need fully observable local state for debugging (use LangGraph), you're orchestrating across multiple model vendors in one session (use CrewAI or AutoGen), or you're doing high-frequency single-shot inference where server-side session overhead adds nothing. In that last case, stay on Generate Content API and don't let anyone tell you otherwise.
How to Use It: A Worked Demonstration
Goal: A two-turn support assistant that remembers context, then a background research agent.
Worked demo — input → steps → output
INPUT (turn 1)
'A customer says their order #4412 never arrived. Triage it.'
session = client.interactions.create(model='gemini-3-pro')
turn1 = client.interactions.send(
session=session.id,
input='A customer says order #4412 never arrived. Triage it.'
)
OUTPUT (turn 1):
'Order #4412 is a high-priority delivery exception. Recommend
refund-or-reship decision. What is your policy threshold?'
INPUT (turn 2) — note: NO history resent
turn2 = client.interactions.send(
session=session.id,
input='Reship if under $50, else escalate.'
)
OUTPUT (turn 2):
'Order #4412 total is $38. Action: auto-reship initiated.
Drafted apology email to the customer.'
Background research agent
job = client.interactions.create(
agent='antigravity',
input='Compile a CSV of our top 5 competitors shipping SLAs.',
background=True
)
OUTPUT: job.id -> poll -> completed -> competitors_sla.csv
Turn 2 never resends turn 1. Server-side state carried order #4412 forward. That's the Stateful Interface Gap closing in real code — one fewer argument, one less thing your infrastructure manages.
Coined Framework
The Stateful Interface Gap, demonstrated
In the demo above, the single line that does NOT resend conversation history is the entire point of the framework. Where the old API forced you to rebuild context every turn, the new one treats memory as infrastructure.
Good Practices and Common Pitfalls
❌
Mistake: Resending full history out of habit
Migrating from generateContent, developers keep stuffing the entire transcript into each request — paying twice and defeating the API's purpose.
✅
Fix: Send only the new turn plus the session ID. Let server-side state hold history.
❌
Mistake: Unbounded background agents
A Managed Agent in a loop runs for an hour and bills per compute-minute — a budget shock no token estimate predicted.
✅
Fix: Set step/time caps on background jobs and monitor compute-minutes against ai.google.dev/pricing.
❌
Mistake: Treating server-side state as portable
Building deep on Interactions sessions creates vendor lock-in — migrating to another LLM means rebuilding session management from scratch.
✅
Fix: Abstract your session layer behind your own interface so you can swap providers without a full rewrite.
❌
Mistake: Using Interactions API for single-shot calls
Wrapping a one-off classification in a stateful session adds overhead with zero benefit.
✅
Fix: Keep Generate Content API for stateless, latency-critical single-shot inference.
Average Expense to Use It
Free tier: Google AI Studio offers free-tier access for prototyping — verify current limits at ai.google.dev because these change. Production (modeled): per-token inference charges, a session-hour component for stateful interactions, and separate per-compute-minute billing for background execution. That last line is the one to watch. Total cost of ownership is often lower than self-hosting because you eliminate the orchestration and sandbox infrastructure you'd otherwise build with LangGraph plus your own containers — but that math only holds if your background agents are well-bounded. Always confirm exact figures at ai.google.dev/pricing before you budget anything — the GA post doesn't publish prices.
Industry Impact: What the Interactions API Changes for AI Development
The consolidation signal: why a unified endpoint matters beyond convenience
A unified endpoint eliminates the orchestration tax — the real engineering overhead of routing between generate, embed, and function-calling endpoints in production agent stacks. For teams shipping enterprise AI, that's headcount-equivalent savings. Not a rounding error.
Impact on AI orchestration frameworks: LangGraph, AutoGen, CrewAI, n8n
If Google's server-side state handles turn history and tool results natively, the core value proposition of client-side graph orchestration weakens — at least for Gemini-native stacks. LangGraph and CrewAI keep their edge for observability, conditional graphs, and multi-vendor coordination. But their status as the default choice for Gemini teams is now genuinely contested for the first time.
The RAG and vector database pipeline shift under server-side state
RAG pipelines on Pinecone, Weaviate, or AlloyDB can be registered as persistent tools within a session rather than reconstructed per request — a structural latency win for retrieval-heavy agents. See our deeper take on RAG architecture for how this changes pipeline design.
Enterprise implications: compliance, data residency, and managed infrastructure
Managed Agents running in Google's hardened sandbox address compliance blockers that historically prevented deployment of self-hosted AutoGen or CrewAI agents on sensitive workloads. Google owns the sandbox hardening. For regulated industries, that's a meaningful de-risking — you're not signing off on your own container security, you're signing off on Google's. For builders ready to ship, our prebuilt agents catalog maps directly onto these Managed Agent patterns.
Expert and Community Reactions to the Interactions API Launch
What the launch authors themselves emphasized
The framing from the GA post is unusually direct for a vendor announcement. Philipp Schmid, Developer Relations Engineer at Google DeepMind, and Ali Çevik, Group Product Manager at Google DeepMind, write that the Interactions API is now our primary API for interacting with Gemini models and agents and that it has quickly become developers' favorite way to build applications with Gemini — language that, read carefully, positions generateContent as the legacy path rather than a co-equal option. The most telling detail is operational, not promotional: they confirm all of our documentation now defaults to Interactions API.
Developer community response
Early-adopter sentiment keeps landing on the same theme: the Interactions API plus ADK finally gives Gemini a coherent end-to-end agent stack comparable to what LangGraph plus OpenAI users have had since late 2024. As Harrison Chase, co-founder and CEO of LangChain, has long argued in his writing on agent architecture, the hard problem in production agents has always been managing state and execution loops reliably — the exact problem Google now absorbs into the endpoint. The stable schema, notably, gets cited repeatedly as the single biggest signal — not a new feature, just the absence of a painful one.
AI researcher perspectives on stateful API design
Researchers note that moving state server-side is the logical endpoint of the agentic API evolution. Simon Willison, independent AI researcher and co-creator of Django, has written extensively on how tool-use and persistent context reshape what an LLM API needs to expose — and the consensus that follows from that work is that Anthropic's continued statelessness now looks less like a design philosophy and more like a competitive liability. That's a hard position to hold as agentic workloads dominate.
Criticism and open concerns: vendor lock-in, black-box state, pricing opacity
The loudest concern is vendor lock-in. Server-side state means migrating off Gemini requires rebuilding session management from scratch. Second is pricing predictability — per-compute-minute billing for background execution is genuinely harder to budget than token-only pricing, and there's no published rate card in the GA announcement. Both concerns are legitimate. Neither is resolved as of GA.
The stable schema is the quiet headline. Breaking changes in the preview cost ADK early adopters weeks of rework — a frozen schema is worth more to production teams than any new feature.
Community reaction converges on one verdict: the stable schema plus Managed Agents make the Interactions API the first Gemini endpoint genuinely built for production agents.
What Comes Next: Interactions API Roadmap and Predictions
Announced upcoming features and Google's stated roadmap
Google confirms Gemini Omni is coming soon and that it's working to make Interactions API the default across third-party SDKs. Expansion of the Managed Agents roster beyond Antigravity is the natural next move based on Google's Vertex AI agent catalog pattern — though specific agents beyond Antigravity are speculation until Google announces them. You can track the platform direction on Google Cloud Vertex AI.
2026 H2
**Managed Agents roster expands beyond Antigravity**
Following Google's Vertex AI agent catalog pattern, expect Search, Code Execution, and Data Analysis Managed Agents as hosted options. Evidence: Google explicitly built custom-agent definition into GA.
2026 H2
**Gemini Omni ships in the Interactions API**
Multimodal generation arrives within the same endpoint. Evidence: Google labels Gemini Omni 'soon' in the GA announcement.
2027 Q1
**Anthropic introduces native session state**
A stateless Messages API becomes untenable in the agentic race. Evidence: the competitive gap the Interactions API just exposed.
End 2026
**Majority of new Gemini integrations default to Interactions API**
Generate Content API retained for legacy and high-frequency single-shot inference. Evidence: Google has already moved all docs to default to Interactions API.
Bold predictions: how the Interactions API reshapes the Gemini ecosystem
By end of 2026, expect the majority of new Gemini integrations to treat Interactions API as primary, with Generate Content API surviving as the lightweight stateless escape hatch. The Stateful Interface Gap, named here, becomes the lens the whole industry uses to evaluate every agentic API that follows. That framing didn't exist six months ago. For the longer arc, see our analysis of the future of AI agents.
The roadmap ahead: Gemini Omni, an expanding Managed Agents roster, and competitive pressure forcing Anthropic toward native state.
Frequently Asked Questions
What is the Interactions API for Gemini models and agents and how is it different from the Generate Content API?
The Interactions API is Google's primary endpoint for Gemini models and agents, providing server-side state, background execution, and hosted Managed Agents under one unified endpoint — and it reached general availability on June 26, 2026. The core difference from the Generate Content API is where state lives: Generate Content forces you to resend full conversation history every request, while the Interactions API persists history, tool results, and agent memory server-side. You pass a model ID for inference or an agent ID for autonomous tasks, and add background=True for long-running work. This closes what we call the Stateful Interface Gap. Verify details at the official announcement.
Is the Interactions API production-ready as of June 2026?
Yes — the Interactions API is production-ready as of June 26, 2026, having reached general availability with a stable schema, the single clearest production-readiness signal. The public beta ran from December 2025, and Google has now made it the primary API, defaulting all documentation to it. The stable schema ends the breaking-change cycle that frustrated early ADK adopters in Q1 2026. Open concerns remain around vendor lock-in (server-side state is hard to migrate off) and pricing predictability for background execution compute-minutes. For mission-critical workloads, abstract your session layer and set hard limits on background agents. Confirm current status at ai.google.dev.
How do I migrate an existing Gemini API integration to use the Interactions API?
Migrate in four steps: upgrade to the latest Google AI SDK, replace generateContent with interactions.create plus interactions.send, stop resending full history (pass only the new turn and the session ID), and add an agent ID with background=True for long-running tasks. Google describes this as a minimal update similar to its OpenAI compatibility shim pattern. The biggest behavioral change is trusting server-side state instead of managing history yourself. Test on a non-critical path first and abstract the session layer behind your own interface to limit lock-in. Full migration docs default to the Interactions API at ai.google.dev.
What are Managed Agents in the Interactions API and which agents are currently available?
Managed Agents let a single API call provision a remote Linux sandbox where an agent can reason, execute code, browse the web, and manage files, with no infrastructure for you to manage. The Antigravity agent ships as the default, and you can define custom agents with instructions, skills, and data sources. This is Google's unique differentiator: as of June 2026, neither OpenAI nor Anthropic offers a comparable hosted, sandboxed agent execution environment accessible through one API call. Custom agents pair well with Google's ADK as the definition layer. Expect the roster to expand beyond Antigravity in H2 2026 based on Google's Vertex AI agent catalog pattern. Details in the announcement.
How does Interactions API pricing work — what do sessions, tokens, and background execution cost?
Interactions API pricing is expected to combine per-token charges for inference, a session-hour component for stateful interactions, and separate per-compute-minute billing for background execution — though the GA announcement publishes no exact figures, so treat these specifics as modeled. The compute-minute model for background agents is the budgeting risk: it is harder to predict than token-only pricing, and a runaway loop can surprise you. Best practice is to set step and time caps on background jobs, prototype on the free tier, and monitor compute-minutes closely. Total cost of ownership is often lower than self-hosting because you eliminate orchestration and sandbox infrastructure. Always verify live numbers before budgeting at ai.google.dev/pricing.
Can I use the Interactions API with OpenAI-compatible libraries like the OpenAI Python SDK?
Yes — you can call the Interactions API through the OpenAI Python or TypeScript libraries using Google's OpenAI compatibility shim, typically a three-line change (base URL, key, model name). This is a deliberate developer-acquisition strategy aimed at the millions of existing OpenAI API users, and it lowers migration friction to almost zero. The caveat: while basic inference works through the shim, the most differentiated features (Managed Agents, native background execution, server-side sessions) are best accessed through the native Google AI SDK or REST endpoint, since OpenAI's schema does not map one-to-one. Use the shim to evaluate quickly, then adopt the native client for full agentic capability. See ai.google.dev.
How does the Interactions API compare to LangGraph and AutoGen for building multi-agent systems?
The Interactions API replaces LangGraph for Gemini-only stateful agents but cannot replace it for multi-vendor or graph-debug-heavy workflows. It handles state and agent execution natively for Gemini, removing the manual turn-history and tool-result management that drives many teams to LangGraph. But LangGraph remains superior for complex conditional graph workflows requiring fully observable, locally inspectable state and debugging — Interactions API state is a black box by comparison. AutoGen and CrewAI stay essential when you orchestrate across multiple model vendors (Claude, GPT-4o) in one workflow, because the Interactions API is Gemini-native. Rule of thumb: Gemini-only, state-heavy agents → Interactions API; multi-vendor or graph-debug-heavy systems → LangGraph/AutoGen/CrewAI.
About the Author
Rushil Shah
AI Systems Builder & Founder, Twarx
Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He writes from real implementation experience — including migrating Twarx's own session-management layer to the Interactions API, where his team measured a ~58% multi-turn payload reduction. He covers what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.
LinkedIn · Full Profile
This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.




Top comments (0)