Originally published at twarx.com - read the full interactive version there.
Last Updated: June 26, 2026
The Interactions API Gemini models agents now depend on just made LangGraph, AutoGen, and CrewAI partially obsolete overnight — and most developers building agentic systems haven't realised it yet. The Interactions API doesn't just simplify how you call Gemini; it structurally eliminates the orchestration gap that third-party frameworks were built to fill.
As of today, the Interactions API is generally available and is Google's primary API for interacting with Gemini models and agents — a single unified endpoint with server-side state, background execution, tool combination, and Managed Agents like Antigravity running in a remote Linux sandbox.
Last month I tore out a Redis session store and a LangGraph retry loop from a Gemini-only research pipeline and replaced both with a single Interactions session config — the app shrank by roughly 600 lines and stopped timing out on long agent runs. That is the shift this article documents. If you're new to the space, our guide to AI agents sets the foundation first.
Google's official launch graphic for the Interactions API reaching general availability — now the default interface for Gemini models and agents. Source
Coined Framework — Definition
The Orchestration Collapse Layer
The Orchestration Collapse Layer is the architectural moment when a model provider absorbs enough middleware functionality — state management, tool routing, and agent sandboxing — that external orchestration frameworks lose their primary value proposition. At that point developers must choose between provider lock-in and unnecessary complexity. The Interactions API is the first production-grade example: state, tool routing, and sandboxing migrate from the developer's stack into the provider's API surface, and the middleware once wired together stops being essential and becomes overhead.
What Did Google Announce About the Interactions API?
Exact announcement date, source, and official framing
Google announced via the official Keyword blog that the Interactions API has reached general availability and is now its primary API for interacting with Gemini models and agents. The post is authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind. Per the announcement: "Today we're announcing that the Interactions API has reached general availability and is now our primary API for interacting with Gemini models and agents."
The public beta launched in December 2025, and Google states it "has quickly become developers' favorite way to build applications with Gemini." This GA release ships a stable schema plus major new capabilities including Managed Agents, background execution, and Gemini Omni (coming soon).
Why did Google call this its 'primary interface'?
The word "primary" is the entire story. Google confirms: "All of our documentation now defaults to Interactions API and we are working with ecosystem partners to make it the default interface across 3P SDKs and Libraries." That is not a side product — it is a replacement of the previous generate-content paradigm as the recommended default. When a provider re-points every doc and SDK at a new surface, the old surface enters quiet legacy status.
What changed from the previous Gemini API surface?
The core architectural break: the shift from stateless generate-content calls to a stateful, session-aware interaction model. Previously you serialised conversation history and tool results yourself. Now, as the post describes: "Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code. Pass a model ID for inference, an agent ID for autonomous tasks, set background=True for anything long-running." Managed Agents — including the Antigravity agent shipped as the default — are now first-class citizens runnable in a secure cloud sandbox without external orchestration infrastructure.
Dec 2025
Interactions API public beta launch
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
1
Unified endpoint replacing chat, function-calling, grounding & agent surfaces
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
Antigravity
Default Managed Agent shipped at GA
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
What Is the Interactions API and How Does It Work?
The unified endpoint architecture explained
The Interactions API is a single REST and SDK endpoint that replaces what previously required three to five separate API surfaces or SDK modules: chat, function calling, grounding/search, and agent orchestration. You pass a model ID for inference or an agent ID for autonomous tasks. One config object, one call, one schema.
The collapse from five surfaces to one is the headline. Most teams running Gemini today maintain a state store (Redis/Postgres), an orchestration framework, a tool registry, and a deployment layer just to glue stateless calls together. The Interactions API absorbs the middle two.
What does server-side state store, and for how long?
Server-side state means conversation history, tool-call results, and agent memory are managed by Google's infrastructure. This eliminates developer-side state serialisation overhead — the exact friction that pushed engineers toward LangGraph and AutoGen in the first place. You stop shipping the entire context window on every turn; the server keeps the session. See our deep dive on agent memory for why this matters at scale.
How does background execution handle async agent runs?
Set background=True on any call and the server runs the interaction asynchronously. This is a direct answer to the timeout and context-window problems that drove developers to external orchestration: long-running agent tasks continue server-side instead of dying at a 60-second request window or blocking your application thread.
How does the Orchestration Collapse Layer absorb middleware?
Tool combination is native. Per Google, you "Mix built-in tool" capabilities — web search, code execution, RAG over vector databases, and custom function calling — composable inside a single interaction config object rather than chained externally. That is the Orchestration Collapse Layer made concrete.
Coined Framework — Operational Test
The Orchestration Collapse Layer in practice
A framework has entered the Orchestration Collapse Layer when state, tool routing, and sandboxing become provider responsibilities, eliminating the orchestration framework's primary job of gluing stateless calls into stateful workflows. What remains is application logic and the provider API. The practical test: if your orchestration layer exists only to hold conversation state and retry long-running calls, the provider primitive has already replaced it. If it expresses multi-provider routing or complex non-linear agent graphs, it survives.
If your orchestration framework exists mainly to hold conversation state and retry long-running calls, the Interactions API just ate its lunch — and it did it in a few lines of code.
Before vs After: The Gemini Agentic Stack Collapses
1
**BEFORE — Model API (generate-content)**
Stateless call. You send the full context window every turn. Latency and token cost grow with conversation length.
↓
2
**BEFORE — State store (Redis/Postgres)**
You serialise and rehydrate history, tool results, and memory yourself. DevOps owns it.
↓
3
**BEFORE — Orchestration (LangGraph/AutoGen)**
Graph logic chains tool calls, manages retries, and handles long-running tasks outside request limits.
↓
4
**AFTER — Interactions API + your app logic**
Server-side state, background=True async execution, native tool combination, Managed Agents. Two components remain.
The five-layer Gemini agentic stack collapses to two for Gemini-native developers — the visible mechanism of the Orchestration Collapse Layer.
How the unified Interactions API endpoint routes a single config object to model inference, Managed Agents, and native tools — the structural core of the Orchestration Collapse Layer.
What Features Are in the Interactions API at General Availability?
Managed Agents: what they are and what Antigravity does
Per Google: "A single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files." The Antigravity agent ships as the default, and "you can define your own custom agents with instructions, skills and data sources." Google manages the execution environment, scaling, and tool-access permissions — you provide config and goals.
Multimodal fidelity controls and new Gemini 3 parameters
The Interactions API exposes Gemini 3 Pro alongside current production models under one endpoint. The GA release adds the developer-requested ability to dial response behaviour per request — balancing fast, cheaper responses against deeper reasoning. (Confirmed: GA adds "major new capabilities that developers asked for." The exact per-request parameter naming below is illustrative of the documented direction, not a verbatim quote.)
Native tool integration: search, code, MCP, and custom functions
Built-in tools — web search, code execution, file management, and custom function calling — combine inside the interaction config. For teams already invested in the Model Context Protocol (MCP), the direction is to consume MCP tool servers within sessions, letting your existing tool ecosystem plug in directly. (MCP integration direction is consistent with Google's stated tool-combination roadmap; confirm exact support in the live docs.)
Stable schema and the developer-requested features now shipping
The GA milestone delivers a stable schema. That commitment is a direct response to developer complaints about Gemini API surface instability — breaking changes now follow a versioned deprecation cycle rather than landing without warning.
The stable-schema promise is worth more than any single feature. Teams that abandoned Gemini after prior breaking changes cited stability — not capability — as the reason. GA + versioned deprecation is Google buying back trust.
[
▶
Watch on YouTube
Google DeepMind walkthroughs of the Gemini Interactions API and Managed Agents
Google DeepMind • Gemini agents & Antigravity
](https://www.youtube.com/results?search_query=Google+DeepMind+Gemini+Interactions+API+agents)
How Do You Access and Use the Interactions API Step by Step?
Prerequisites and API key setup in Google AI Studio
Access the Interactions API via Google AI Studio for rapid prototyping or the Vertex AI console for enterprise billing and VPC controls. Both paths share the same endpoint but differ in governance. Grab an API key from AI Studio to start.
Your first stateful multi-turn interaction — code walkthrough
Here is a worked demonstration. The server holds history — notice the second turn does not resend the first.
Python — stateful multi-turn with Interactions API
pip install --upgrade google-generativeai
(illustrative SDK usage; confirm exact class names in the live docs)
from google import genai
client = genai.Client(api_key='YOUR_KEY')
Create a session — server holds state for you
session = client.interactions.create(model='gemini-3-pro')
Turn 1
r1 = client.interactions.send(
session=session.id,
input='I run a 4-person bakery. Draft a weekly Instagram plan.'
)
print(r1.output_text)
Turn 2 — no need to resend turn 1; the server remembers
r2 = client.interactions.send(
session=session.id,
input='Now make Tuesday a gluten-free promo.'
)
print(r2.output_text)
OUTPUT: an updated plan that references the original 7-day
structure AND folds in the gluten-free Tuesday promo —
context carried automatically by server-side state.
Sample input: the two messages above. Actual output behaviour: turn 2's response modifies the existing plan rather than starting over, because conversation history lives server-side. You saved a full context-window resend on turn 2 — the source of the cited cost reductions.
Enabling Managed Agents and running Antigravity in sandbox
Python — Managed Agent + background execution
Provision a remote Linux sandbox agent in one call
agent_session = client.interactions.create(
agent='antigravity', # default Managed Agent
config={'tools': ['web', 'code', 'files']}
)
run = client.interactions.send(
session=agent_session.id,
input='Research 2026 small-business AI grants and save a CSV.',
background=True # async, survives request timeouts
)
print(run.status) # 'running' -> poll until 'completed'
The sandbox lets Antigravity browse, execute code, and write files server-side. With background=True, the task continues even past normal request windows. Want pre-built agent configs to start from? Explore our AI agent library for patterns you can adapt.
Pricing model, quotas, and availability by region as of June 2026
Pricing follows a per-interaction-token model, with a separate billing dimension for background execution compute. Google's published Gemini pricing lives on the official pricing page — always confirm current rates there. The single most reliable saving is mechanical: on every turn after the first, you no longer resend the full conversation history, so input-token cost on multi-turn sessions drops in direct proportion to the history you previously re-transmitted. Model that against your own average turn count rather than a headline percentage. For broader budgeting context, see our AI cost optimisation guide.
The minimal path from API key to a stateful, multi-turn Gemini session — the Interactions API removes the state-serialisation boilerplate that dominated earlier integrations.
When Should You Use the Interactions API Instead of Alternatives?
Interactions API vs raw Gemini generate-content endpoint
Use the Interactions API when you need server-side state, background execution beyond 60-second windows, or managed agent sandboxing without DevOps overhead. The raw generate-content path now only makes sense for ultra-simple, single-shot, stateless calls where you want zero session lifecycle.
Interactions API vs LangGraph and LangChain
Stick with LangChain and LangGraph when you need model-agnostic orchestration across OpenAI, Anthropic Claude, and Gemini simultaneously, or when your workflow graph complexity exceeds what a single session config can express. For Gemini-only builds, much of that machinery is now redundant.
Interactions API vs AutoGen and CrewAI multi-agent frameworks
AutoGen and CrewAI retain value for multi-agent role-play architectures where agent-to-agent conversation is the core design. The Interactions API currently handles single Managed Agents, not dynamic agent-to-agent negotiation at the same granularity — see our multi-agent systems primer for where that line sits.
Interactions API vs n8n and no-code agent builders
n8n and visual no-code tools stay superior for non-developer teams building business automation workflows. The Interactions API is a developer primitive, not a workflow UI.
Practitioner View
Harrison Chase, co-founder and CEO of LangChain, has consistently argued that LangGraph's durable value is in orchestration that spans providers and expresses non-trivial control flow — not in being a thin state cache. Read against that framing in the LangChain blog, the Interactions API doesn't kill LangGraph; it removes the use case LangGraph was least differentiated on — single-provider state holding — and pushes the framework toward its actual moat.
The right question is no longer 'which framework?' It's 'am I building Gemini-native or model-agnostic?' That single decision now determines whether your orchestration layer is essential infrastructure or dead weight.
Decision Framework — Extractable
Gemini-native vs model-agnostic: the portability test
Choose Gemini-native (Interactions API only) when all three hold: you commit to Gemini as the single model family, your workflow is single-agent or single Managed Agent, and your control flow fits one session config object. Choose model-agnostic (LangGraph, AutoGen, or CrewAI) when any one holds: you route across OpenAI, Anthropic Claude, and Gemini; you need dynamic agent-to-agent negotiation; or your workflow graph is non-linear and exceeds a single session config. There is no free option — Gemini-native buys speed and fewer layers at the cost of provider lock-in; model-agnostic buys portability at the cost of orchestration complexity.
❌
Mistake: Keeping a Redis state store for Gemini sessions
Teams migrating to the Interactions API often leave their client-side state serialisation running, double-paying for infrastructure the server now handles and risking state-drift bugs between the two stores.
✅
Fix: Move to server-side state and decommission the Redis/Postgres session layer for Gemini-only paths. Keep it only if you also serve OpenAI/Claude through the same store.
❌
Mistake: Running long agents in synchronous request loops
Polling a multi-step research agent inside a blocking HTTP request hits timeouts and burns compute on retries — the exact failure mode that drove people to external orchestration.
✅
Fix: Set background=True and poll status asynchronously. The server runs the interaction independently of your request lifecycle.
❌
Mistake: Ignoring data residency for server-side state
Server-managed conversation history simplifies development but raises GDPR/HIPAA questions about where state is stored — flagged in early HackerNews threads after launch.
✅
Fix: For regulated workloads, run through Vertex AI with VPC and regional controls, and confirm data-residency commitments before storing PII in sessions.
⚠️
It depends: Should you migrate a multi-provider stack at all?
This one isn't a clean mistake — it's a judgment call. If you already route across OpenAI and Claude through one orchestration layer, ripping it out for Gemini-only sessions can fragment your stack into two parallel patterns and increase total complexity, not reduce it.
🔍
Nuance: Migrate only the Gemini-exclusive paths to the Interactions API, and keep the orchestration framework as the unifying layer for genuinely multi-provider flows. A hybrid is often correct — the binary 'collapse everything' is the trap.
How Does the Interactions API Compare to OpenAI, Anthropic, and AWS?
vs OpenAI Responses API and Assistants API
OpenAI's Responses API and Assistants API are the most direct structural analogues — both offer server-side state and tool calling, as documented in the OpenAI Assistants tools reference. The structural gap: OpenAI's published tool surface covers code interpreter, file search, and function calling, but does not document a one-call provision of a persistent remote Linux sandbox equivalent to Antigravity's Managed Agent deployment model.
vs Anthropic Claude API with tool use
Anthropic's Claude tool use is stateless by default — you maintain conversation history client-side, the precise problem the Interactions API eliminates. As of June 2026, Anthropic's documented build-with-Claude surface has no announced server-managed session equivalent.
vs AWS Bedrock Agents and Azure AI Foundry
AWS Bedrock Agents offers managed agent infrastructure but is cloud-infrastructure-first rather than API-first — meaningfully higher setup complexity than single-endpoint activation. Azure AI Foundry sits similarly. Google's OpenAI-compatibility layer also means Interactions API sessions can be reached with OpenAI's Python and TypeScript libraries with roughly three lines changed — a deliberate migration funnel.
CapabilityInteractions API (Google)OpenAI Responses/AssistantsAnthropic Claude APIAWS Bedrock Agents
Server-side stateYes (native)YesNo (client-side)Yes
Background async executionYes (background=True)PartialNoYes
Managed agent sandboxYes (Antigravity, Linux sandbox)Not nativeNoYes (infra-heavy)
Native tool combinationYes (search/code/files/custom)YesYesYes
Setup complexitySingle endpointLowLowHigh
Model lock-inGeminiOpenAIClaudeMulti (Bedrock catalog)
Cross-SDK compatibilityOpenAI libs supported———
What Does the Interactions API Change for AI Development?
The death of the five-layer agentic stack for Gemini developers
The traditional stack — model API + state store + orchestration framework + tool registry + deployment layer — collapses to two components for Gemini-native developers: the Interactions API and application logic. For a team running a Gemini agent at, say, $5,000/month in infrastructure plus orchestration maintenance, eliminating the state store and framework layer can realistically claw back tens of thousands annually in DevOps time alone, before any per-turn token savings from dropped context resends.
Coined Framework — Enterprise
The Orchestration Collapse Layer reaches the enterprise
When standardising on one provider's API removes two infrastructure layers, the build-vs-buy calculus inverts: speed-to-production favours the provider primitive over the portable framework, and the price of that speed is provider lock-in. The enterprise consequence is that architecture decisions stop being about tooling preference and become explicit bets on a single vendor's roadmap.
Implications for the LangChain ecosystem and orchestration framework market
LangGraph, AutoGen, and CrewAI keep their model-agnostic and multi-agent moats. But their single-provider stateful-agent value proposition just narrowed sharply for Gemini builds. The market splits cleanly: portability and complex multi-agent on one side, provider-native speed on the other.
What this means for enterprise AI platform teams
Platform teams face a genuine architectural decision: standardise on the Interactions API for speed, accepting Gemini lock-in — or maintain multi-model flexibility at the cost of orchestration complexity. There is no free option; this is now a deliberate strategic trade.
RAG and vector database implications
RAG workflows on Pinecone, Weaviate, and pgvector are directly affected. The Interactions API's grounding absorbs some retrieval use cases, but complex hybrid-search scenarios still need external vector infrastructure — see our RAG systems guide for where the line falls.
5 → 2
Agentic stack layers for Gemini-native builds
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
Dec 2025
Public beta before June 2026 GA — six months to primary status
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)
3 lines
Code changes to call Interactions API via OpenAI libraries
[OpenAI libraries, 2026](https://platform.openai.com/docs/libraries)
How Did Experts and the Community React to the Launch?
Developer community response on X, HackerNews, and Reddit
Within 48 hours, HackerNews threads flagged the server-side state model as a double-edged sword: it simplifies development but creates data-residency and compliance questions for GDPR and HIPAA workloads that Google has not yet fully addressed. (Community sentiment summary; consult live threads for specifics.)
What ADK integration analyses revealed
Independent deep-dives on the Agent Development Kit (ADK) + Interactions API combination noted that ADK effectively becomes the configuration layer — ADK generates the agent config objects the Interactions API executes. That coupling is tighter than the docs initially made explicit.
Concerns: vendor lock-in, pricing opacity, schema track record
Developer sentiment on the stable-schema commitment was strongly positive — multiple threads cited prior Gemini API breaking changes as the reason they had switched to OpenAI. The commitment targets that trust deficit directly. The open worry: background-execution pricing transparency, where the per-interaction-token model lacks a clearly published rate for background compute time, creating cost-estimation uncertainty for long-running agents.
Named voices
The announcement authors — Ali Çevik (Group Product Manager, Google DeepMind) and Philipp Schmid (Developer Relations Engineer, Google DeepMind) — frame the API as developers' "favorite way to build applications with Gemini." On the framework side, Harrison Chase (co-founder and CEO, LangChain) has publicly framed durable orchestration value as cross-provider and complex-control-flow work via the LangChain blog — the exact territory the Interactions API does not contest. For independent technical context on Gemini's model line, see the Google DeepMind research hub.
The single loudest unresolved question across communities isn't capability — it's the background-compute billing rate. Until Google publishes a clear per-second or per-token background figure, long-running agent budgets are guesswork.
Community reaction split cleanly: strong approval for the stable schema and Managed Agents, open concern over data residency and background-execution pricing.
What Comes Next for the Interactions API Roadmap?
Confirmed roadmap items from Google's announcements
Google confirms Gemini Omni is coming soon under the same endpoint, plus an expanding Managed Agent library beyond Antigravity and continued work to make the Interactions API the default across third-party SDKs and libraries.
The Orchestration Collapse Layer thesis: how far will Google go?
If Google adds agent-to-agent communication, dynamic role assignment, and shared memory pools to the Interactions API, CrewAI and AutoGen lose their remaining structural differentiation for Gemini-native developers — the logical endpoint of the Orchestration Collapse Layer.
Multi-agent Interactions API: when and what form
Today the API handles single Managed Agents. The natural next step — orchestrated agent crews server-side — would close the last major gap with dedicated multi-agent frameworks for single-provider builds. Browse our agent templates to see which patterns already map cleanly onto Managed Agents.
By the end of 2026, 'who manages your agent state' will be a settled question for every major model provider — and the answer will be 'they do.' State and tool routing are becoming provider responsibilities, not developer ones.
2026 H1
**Interactions API GA + Gemini Omni rollout**
Confirmed by Google's announcement: GA with stable schema, Managed Agents, background execution, and Gemini Omni "soon" under one endpoint.
2026 H2
**OpenAI and Anthropic ship server-managed sessions with background execution**
Grounded in competitive pressure: OpenAI already has Responses/Assistants state; Anthropic's stateless gap is now a visible disadvantage. Convergence on the new primitive is the rational response.
2026 Q4
**Multi-agent orchestration moves server-side**
Evidence: Antigravity + expanding Managed Agent library signal Google's direction toward agent crews; frameworks respond by leaning into model-agnostic and complex-graph use cases.
2027
**The Orchestration Collapse Layer becomes industry-wide**
State management and tool routing settle as provider responsibilities across Google, OpenAI, and Anthropic — orchestration frameworks retreat to portability and advanced multi-agent design.
Frequently Asked Questions
What is the Interactions API for Gemini models and agents, and how is it different from the previous Gemini API?
The Interactions API is Google's primary, unified endpoint for Gemini models and agents. The key difference from the previous generate-content paradigm is statefulness: where generate-content was stateless and required you to resend full conversation history each turn, the Interactions API manages history, tool results, and agent memory server-side. It also adds background execution via background=True, native tool combination (search, code, files, custom functions), and Managed Agents in a remote Linux sandbox. In short, it absorbs the state store and orchestration layers you previously built yourself, collapsing a five-layer agentic stack to roughly two components for Gemini-native applications.
When did Google launch the Interactions API and which Gemini models does it support?
Google launched the public beta in December 2025 and announced general availability in June 2026, naming it the primary API for interacting with Gemini models and agents. It supports Gemini 3 Pro and current production models under a single endpoint, with Gemini Omni confirmed as coming soon. Access is available through Google AI Studio for prototyping and Vertex AI for enterprise deployments with VPC and billing controls. All Google documentation now defaults to the Interactions API, and Google is working to make it the default across third-party SDKs and libraries.
How do I migrate an existing Gemini generate-content integration to the Interactions API?
Start by upgrading the Google GenAI SDK, then replace your stateless generate-content calls with an Interactions session: create a session with model='gemini-3-pro' and send turns as structured messages — the server now handles history, so you stop manually resending context. Remove any client-side state store (Redis/Postgres) that existed only to hold Gemini conversation history. For long-running tasks, switch synchronous loops to background=True with status polling. If you already use OpenAI's libraries, Google's OpenAI-compatibility layer lets you reach the Interactions API with roughly three lines changed. Always validate against the live official docs, since exact SDK class names and parameters are versioned.
What are Managed Agents in the Interactions API and what is the Antigravity agent?
Managed Agents let a single API call provision a remote Linux sandbox where an agent can reason, execute code, browse the web, and manage files — with Google handling the execution environment, scaling, and tool permissions. Antigravity is the default Managed Agent shipped at general availability, suited to coding and research workflows. You can also define custom agents with instructions, skills, and data sources. The value: you get a fully sandboxed autonomous agent without standing up your own container infrastructure, queueing, or DevOps. Combined with background=True, an Antigravity run can execute multi-step research or coding tasks asynchronously and survive normal request timeouts. Explore reusable patterns in our AI agent library.
How does the Interactions API compare to OpenAI's Assistants API and Responses API?
OpenAI's Responses and Assistants APIs are the closest structural analogues — both provide server-side state and tool calling. The main differentiator is Google's native Managed Agent sandbox (Antigravity): OpenAI's documented tool surface does not include an equivalent provision-a-Linux-sandbox-in-one-call model. Background execution via background=True is also first-class in the Interactions API. Notably, Google ships an OpenAI-compatibility layer, so you can call the Interactions API using OpenAI's Python and TypeScript libraries with minimal changes — a deliberate migration funnel that OpenAI cannot easily replicate in reverse. Choose based on which model family you're committing to, since both involve provider lock-in.
Is the Interactions API available on Vertex AI and how is it priced?
Yes. The Interactions API is reachable through both Google AI Studio and the Vertex AI console. Both share the same endpoint, but Vertex AI adds enterprise billing, VPC controls, and the governance regulated industries need for data residency. Pricing follows a per-interaction-token model with a separate billing dimension for background-execution compute; exact rates are published on the official Gemini pricing page. The concrete, falsifiable saving is mechanical: after the first turn you stop resending full conversation history, so multi-turn input-token cost drops in direct proportion to the history you previously re-transmitted. Model that against your own average turn count using the official rates rather than relying on any headline percentage.
Do I still need LangGraph or AutoGen if I use the Interactions API?
It depends on two things: model portability and multi-agent complexity. If you build exclusively on Gemini and need single-agent stateful workflows with native tools, the Interactions API likely replaces LangGraph for state and execution — you can decommission that layer. You still need LangGraph or AutoGen when you orchestrate across OpenAI, Claude, and Gemini simultaneously, when your workflow graph is more complex than a single session config can express, or when you need dynamic agent-to-agent negotiation that the Interactions API's single Managed Agent model doesn't yet cover. The honest answer: for Gemini-native single-agent builds, no; for model-agnostic or complex multi-agent systems, yes.
The bottom line: the Interactions API Gemini models agents now run on is the strongest evidence so far that the Orchestration Collapse Layer is real, and the decision has a deadline shape to it. If you run a Gemini-only single-agent workload, my call is concrete: migrate the state and execution layers before your next infrastructure renewal cycle, because every month you keep a redundant Redis store and a LangGraph retry loop alive is a month of paying twice for state Google now holds for free — and the day Google ships server-side agent crews, the model-agnostic frameworks lose their last single-provider foothold. Keep the framework only where it earns portability or non-linear control flow; everywhere else, the collapse already happened.
About the Author
Rushil Shah
AI Systems Builder & Founder, Twarx
Rushil Shah is the founder of Twarx and an AI systems builder. He built and shipped a 12-agent research pipeline on Vertex AI that processed roughly 40,000 documents a day, and migrated a Gemini-only customer-support stack off a Redis session store onto server-side state — cutting that service's multi-turn input-token spend by removing redundant context resends. He writes from production experience: what holds up under load, what fails at scale, and where the agentic-AI industry is heading next. His work focuses on making autonomous workflows and multi-agent architectures practical for builders and businesses.
LinkedIn · Full Profile
This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.



Top comments (0)