aarhamforensics

Posted on Jun 25 • Originally published at twarx.com

Interactions API Gemini Models Agents: The Complete GA Guide

#ai #machinelearning #automation #productivity

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 25, 2026

Every AI application built on stateless request-response APIs is not an agent — it is an expensive autocomplete system pretending to be one. The Interactions API for Gemini models and agents, Google's now generally available unified endpoint, is the first major model provider to hard-code the Statefulness Threshold directly into a production API. Read the official announcement for the canonical wording.

The Interactions API is now Google's primary interface for both Gemini models and agents — a single unified endpoint with server-side state, background execution, combined tool use and multimodal generation. It directly displaces the GenerateContent API for agentic workloads.

By the end of this article you'll know exactly what changed, how server-side state works, what it costs, and whether you should migrate your GenerateContent codebase this quarter.

The Interactions API GA announcement — a single unified endpoint for Gemini models and agents with server-side state, background execution and managed agents. Source

Coined Framework

The Statefulness Threshold — the architectural inflection point at which AI systems stop being prompt executors and become persistent workflow agents, which the Interactions API is the first production-grade API to formally cross

Below the threshold, your system rebuilds context on every call and your client owns all memory. Above it, the model provider holds state server-side and your application becomes a thin orchestrator over a persistent agent — which is the entire premise of the Interactions API.

Breaking: What Google Announced on June 23, 2026

Official announcement details and exact GA date

On June 23, 2026, Google DeepMind announced via the official Keyword blog that the Interactions API has reached general availability and is now 'our primary API for interacting with Gemini models and agents.' The post is authored by Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind.

The API launched in public beta in December 2025 and, per Google, 'quickly become developers' favorite way to build applications with Gemini.' Six months of real developer feedback. That's not nothing.

What changed from preview to general availability

The headline GA change is a stable schema. Google explicitly states the API 'now has a stable schema' — which is the implicit promise developers actually care about: no breaking changes post-GA. All Gemini documentation now defaults to the Interactions API, and Google is 'working with ecosystem partners to make it the default interface across 3P SDKs and Libraries.' When the docs move, the investment moves.

Key developer-requested features shipping in the stable release

Three major capabilities shipped simultaneously at GA, all explicitly described by Google as features 'developers asked for':

Managed Agents — a single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files. The Antigravity agent ships as the default reference implementation.
Background execution — set background=True on any call and the server runs the interaction asynchronously.
Combined tool use — mix built-in tools in a single call (Google's source text begins describing 'Tool improvements: Mix built-in tool').
Gemini Omni — flagged as 'soon' in the announcement.

Calling the Interactions API the 'primary API' is the loudest deprecation signal Google has sent for the GenerateContent API on agentic use cases. When the docs default to a new endpoint, the old one is on a glide path — not gone, but no longer where the investment goes.

A stable schema is not a feature — it is a contract. The moment Google locked the Interactions API schema, building stateful Gemini agents stopped being a research bet and became an infrastructure decision.

What Is the Interactions API? The Complete Technical Definition

The Statefulness Threshold: why this architecture is fundamentally different

For three years, every production AI app followed the same loop: collect the full conversation history client-side, serialize it, ship the entire blob to the model on every turn, pay for those tokens again, and discard the result. That is the world below the Statefulness Threshold. The Interactions API is the first tier-1 provider API to put the conversation, tool state and agent execution context on the server by default. I've rebuilt that history-serialization loop more times than I'd like to admit. It's always the thing that breaks first at scale. For the underlying transformer mechanics, the original Attention Is All You Need paper remains the canonical reference.

How server-side state storage works under the hood

With server-side state, conversation context persists without client-side history management. Your application references a session rather than re-uploading history. The mechanism mirrors how a persistent process holds memory between calls — except the process lives in Google's infrastructure and survives across requests, devices and even background runs.

The single unified endpoint model explained

The unification is the product. One endpoint serves both Gemini models and deployed agents through an identical interface. Pass a model ID for inference, an agent ID for autonomous tasks, and set background=True for anything long-running. The same schema handles multimodal input — text, image, audio, video and documents — in a single request. No surface-switching. No separate clients to keep in sync.

Below vs Above the Statefulness Threshold: Request Lifecycle

  1


    **GenerateContent (stateless)**

Client stores full history → re-sends entire transcript every turn → pays input tokens for the whole history again → model returns one completion → client appends and stores.

↓


  2


    **Interactions API (stateful)**

Client opens a session → server holds context → client sends only the new turn or an agent ID → server reasons, runs tools, persists state.

↓


  3


    **Background execution**

Set background=True → server returns a job handle → long-running agent work continues without holding a connection → client polls for completion.

↓


  4


    **Managed Agent execution**

Agent ID provisions a Linux sandbox → agent browses, executes code, manages files → results return through the same unified endpoint.

The sequence matters: crossing from step 1 to step 2 is the Statefulness Threshold — everything after it is only possible once the provider owns state.

The architectural shift the Statefulness Threshold names: client-managed history gives way to provider-held sessions, which is what unlocks managed agents and background execution.

The single biggest cost in most chat apps is not generation — it is re-paying for the same conversation history on every turn. Server-side state quietly deletes that line item.

Full Capability Breakdown: Every Feature in the Interactions API

Managed Agents: how they differ from self-hosted agents

Managed Agents are the marquee GA feature. Per Google, 'a single API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files.' The Antigravity agent ships as the default, and you can define custom agents 'with instructions, skills and data sources.'

The difference from self-hosted agents built on LangGraph or CrewAI is who owns the runtime. With Managed Agents, Google owns the sandbox, the code execution environment and the file system. You own the instructions. That's a meaningful shift in where your operational burden lives.

Background execution and async task polling

Setting background=True runs the interaction asynchronously server-side. Conceptually similar to the async runs model in OpenAI's Assistants API — kick off a job, receive a handle, poll for completion — but here it's a single parameter on the same unified endpoint, not a separate API surface you have to learn and maintain separately.

Combined tool use: grounding, code execution, function calling and MCP in one call

The Interactions API lets you combine built-in tools in a single call. Grounding, code execution and external function calls that previously required orchestration middleware can now co-exist in one request. This is the feature I'd have killed for two years ago. Native MCP (Model Context Protocol) support is the strategic move here — Google adopting the protocol Anthropic introduced reframes MCP as neutral industry infrastructure rather than an Anthropic-owned standard.

Multimodal fidelity controls and latency parameters

The Gemini 3 generation introduces explicit parameters for tuning multimodal fidelity and latency, exposed directly in the request schema. Text, image, audio, video and documents all flow through one schema rather than separate endpoints — which sounds minor until you've spent a week wiring together three different clients for a single pipeline. The broader Gemini API documentation now defaults all multimodal examples to this schema.

Level of Thinking parameter and cost-control mechanisms

Gemini 3 adds an explicit 'level of thinking' control: developers tune reasoning depth against latency and cost on a single dimension. This is the cost-governance lever that stateful, long-running agents desperately need. Without it, a background agent can quietly burn budget on over-reasoning while you're asleep. I'd treat this parameter as mandatory configuration, not optional tuning.

Dec 2025
Interactions API public beta launch
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1 call
Provisions a full Linux sandbox for a Managed Agent
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




6 mo
Beta feedback window that shaped the GA stable schema
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)

The most underrated GA feature is combined tool use. Putting grounding + code execution + function calling in one call removes the single most error-prone layer in agent stacks: the orchestration middleware that stitches tool outputs back into context.

[
▶

Watch on YouTube
Building stateful Gemini agents with the Interactions API
Google DeepMind • Gemini agents & managed agents

](https://www.youtube.com/results?search_query=Google+Gemini+Interactions+API+agents)

How to Access and Use the Interactions API: Step-by-Step Guide

Prerequisites: API key, SDK version, and model availability

You need an API key from Google AI for Developers (or access via Vertex AI for enterprise) and a current Python SDK version. Here's the trap that'll cost you a day: older SDK versions can fall back to GenerateContent behavior silently. You won't get an error. Your stateful features just won't exist. Pin a recent release before you write a single line of agent logic.

Making your first stateful interaction: code walkthrough

Python — first stateful Interactions API call

Inference against a Gemini model via the unified endpoint

response = client.interactions.create(
model='gemini-3', # pass a model ID for inference
input='Summarize Q2 churn drivers from the attached report.',
files=['q2_churn.pdf'], # multimodal input in the same schema
)
print(response.output_text)

Continue the SAME server-side session — no history re-sent

follow_up = client.interactions.create(
session=response.session_id, # server holds the context
input='Now draft a 3-bullet exec summary.',
)
print(follow_up.output_text)

Deploying a Managed Agent: the Antigravity reference agent

Python — run a Managed Agent in the background

One call provisions a Linux sandbox; Antigravity is the default agent

job = client.interactions.create(
agent='antigravity', # pass an agent ID for autonomous tasks
input='Research 2026 EV tax credits and save a CSV of eligible models.',
background=True, # server runs it asynchronously
)

Poll the job handle until the agent finishes browsing + writing files

result = client.interactions.poll(job.id)
print(result.status, result.files)

To customize, define your own agent with instructions, skills and data sources instead of using antigravity — the call shape stays identical. If you want pre-built starting points, explore our AI agent library for patterns you can adapt to Managed Agents.

Pricing model: what server-side state and background execution cost

Two pricing tiers apply depending on whether you access via Google AI for Developers (pay-as-you-go) or Vertex AI (enterprise/committed use). A free tier covers limited Interactions API calls for prototyping; production moves to pay-as-you-go or committed-use pricing. The critical new cost dimension — and the one that's already bitten teams in the beta — is that server-side state is billed against active sessions. Idle sessions accumulate cost if you never set timeouts. Always check the live Gemini API pricing page for current rates, and treat session timeout configuration as a billing control, not an afterthought.

Availability: regions, quotas, and enterprise tier access

The API is available through both the Google AI for Developers portal and Vertex AI. Enterprise teams get SLA-backed access through Vertex AI. Apple developer integration is confirmed — the on-device Foundation Models framework can invoke cloud Gemini via the Interactions API, enabling a hybrid on-device/cloud agent pattern not previously possible at this integration depth. For broader deployment patterns, see our guide to enterprise AI deployment.

A worked Managed Agent run: one call provisions the sandbox, background execution returns a job handle, and polling retrieves the agent's files — all through the unified endpoint.

The silent SDK fallback is the trap that will burn a day of your sprint: if your stateful features 'don't exist,' your SDK is too old and you're hitting GenerateContent behavior. Pin the version before you debug the logic.

When to Use Interactions API vs Alternatives

Interactions API vs GenerateContent API: the migration decision matrix

GenerateContent remains the right call for single-turn, stateless, high-volume inference where the client manages no state — classification, one-shot extraction, embeddings-adjacent calls. The moment you have multi-turn context, tool orchestration, or long-running work, the Interactions API wins. Don't migrate everything at once. Classify first, then move the agentic workloads.

Interactions API vs Google ADK: complementary or competing?

The Agent Development Kit (ADK) sits above the Interactions API — ADK calls the Interactions API internally. Not a replacement. A higher-level authoring layer. Treat the Interactions API as the primitive and ADK as the framework you optionally build on top of it.

When stateless is still the right choice

RAG pipelines that rebuild context per query from a vector database may not benefit from server-side state — if you reconstruct context every time anyway, you're paying for a session you never actually use. Background execution only pays off for tasks exceeding roughly 10 seconds; short tool calls stay better synchronous. Google's documentation names the three sweet spots directly: multi-turn customer service, code generation sessions, and research agents.

  ❌
  Mistake: Migrating high-volume stateless calls

Moving a one-shot classification endpoint to the Interactions API adds session overhead and cost for zero benefit — you don't need state you never reuse.

✅

Fix: Keep single-turn, stateless inference on GenerateContent. Migrate only multi-turn and agentic workloads first.

  ❌
  Mistake: Leaving sessions open indefinitely

Server-side state bills against active sessions. Without explicit timeouts, idle sessions accumulate cost invisibly — the #1 community complaint at GA.

✅

Fix: Set explicit session timeouts and close sessions on conversation end. Monitor active-session counts as a first-class metric.

  ❌
  Mistake: Assuming server-state replaces RAG

Session state ≠ long-term knowledge retrieval. Dropping your vector DB because 'Gemini remembers now' loses durable, queryable enterprise knowledge.

✅

Fix: Keep Pinecone/Weaviate for knowledge retrieval; use server-side state for in-flight conversation continuity only.

Interactions API vs Closest Competitors: Honest Comparison

vs OpenAI Assistants API and Responses API

OpenAI's Assistants API shipped async runs back in 2023 and pioneered server-side threads. That's real. The Interactions API's differentiator is native multimodal generation and combined tool use in a single call — grounding, code execution and function calls together — which OpenAI still splits across surfaces. Whether that matters depends entirely on your workload.

vs Anthropic Claude tool use and MCP server model

Anthropic positions MCP as neutral infrastructure. Google natively supporting MCP inside the Interactions API is a strategic endorsement — it strengthens the protocol while undermining any single-vendor ownership narrative. That's good for the ecosystem regardless of which model you're running.

vs LangGraph, CrewAI, AutoGen, and n8n orchestration layers

LangGraph and CrewAI become optional middleware — the Interactions API absorbs the state management and tool orchestration they used to provide. AutoGen multi-agent conversations can call the Interactions API as a primitive. And n8n can invoke it over REST, giving no-code agents a stable stateful backend. See our deeper take on agent orchestration layers and multi-agent systems.

The vendor lock-in question

Server-side state is Google-hosted and is opaque JSON. Migrating live sessions to another provider requires session-serialization tooling that isn't standardized yet. That's the real lock-in cost of going native — not the API contract, but the session portability problem you'll discover the day you want to switch. Go in with eyes open on this one.

CapabilityGoogle Interactions APIOpenAI Assistants APIAnthropic + MCPLangGraph / CrewAI

Server-side stateNative, defaultThreads (native)Client-managedFramework memory modules

Unified model + agent endpointYes (model ID / agent ID)Separate surfacesNoN/A (orchestrator)

Managed agent sandboxYes (1 call, Antigravity)Code interpreter toolNo native sandboxSelf-hosted

Background executionbackground=True flagAsync runs (2023)ManualFramework-dependent

Combined tools in one callGrounding + code + functionsPartialTool use + MCPYou wire it

Native MCP supportYesGrowingYes (originator)Via integrations

Multimodal in one schemaText/image/audio/video/docsPartialText/imageModel-dependent

When a model provider absorbs state management and tool orchestration into the API itself, orchestration frameworks don't die — they get demoted from infrastructure to convenience.

Industry Impact: Why the Interactions API Changes the AI Development Stack

The death of client-side state management as a development pattern

Client-side history management has been the default since the first chat API. Making server-side state the provider default reframes it as legacy behavior. The memory modules in LangChain and LlamaIndex chat engines now overlap directly with a native API feature. They're not eliminated — they're commoditized, which is a different and more interesting kind of threat.

How this affects the middleware and orchestration tool market

Orchestration tools shift from 'required plumbing' to 'optional ergonomics.' The defensible value moves up the stack: multi-agent coordination, evaluation, observability and human-in-the-loop — not state and tool wiring. The frameworks that figure this out fast will be fine. The ones that don't will look like jQuery in 2019.

Enterprise and Apple developer ecosystem implications

Enterprises now get a single SLA-backed endpoint for both model inference and agent execution. That simplifies vendor consolidation conversations considerably. The simultaneous Apple Foundation Models bridge enables an on-device/cloud hybrid agent pattern not previously possible at this integration depth — which opens a genuinely new class of mobile-native agentic applications.

The Statefulness Threshold and AI product moats

Products built on the Interactions API can ship session-persistent AI features without backend engineering — lowering the build cost of AI-native SaaS by an estimated 40% in infrastructure overhead. Vector vendors like Pinecone, Weaviate and Chroma retain relevance because server-side state ≠ long-term retrieval. RAG isn't replaced. See our primer on retrieval-augmented generation.

Coined Framework

The Statefulness Threshold as a vendor feature, not a framework feature

Once persistence lives in the API rather than your orchestration code, the build-vs-buy calculus inverts: state stops being something you engineer and becomes something you procure. That single shift changes every enterprise AI architecture decision from 2026 forward.

What Most People Get Wrong About the Interactions API

The loudest misread is 'Gemini has memory now, so I can delete my vector database.' Wrong threshold entirely. Server-side session state holds the live conversation; it's not durable, queryable, enterprise knowledge. RAG and the Interactions API solve orthogonal problems — one is short-term continuity, the other is long-term retrieval. Conflating them is how teams end up with agents that can't find anything.

The second misread: 'This kills LangGraph and CrewAI.' Also wrong. The API absorbs state and tool wiring, not coordination. Multi-agent orchestration, evaluation harnesses and human-in-the-loop gates still need a framework. The frameworks that win will reposition as Interactions-API-native, not fight it. For the broader landscape, see our overview of LLM API design patterns.

The counterintuitive truth: the Interactions API makes orchestration frameworks more valuable for complex multi-agent work, because it removes the boring 60% (state plumbing) and leaves the hard, differentiating 40% (coordination, evals, safety).

Expert and Community Reactions to the Interactions API GA

Developer community response

Early adopters singled out server-side state as the most consequential feature. Analysts have described it as 'the thing LangChain memory was trying to be' — which is a fair read. The ADK + Interactions API combination is being called 'the most complete agent stack from a single vendor' across multiple developer reviews, and from what I've seen in the beta, that framing holds up. Community threads on r/MachineLearning echoed the same takeaways.

Critical perspectives: lock-in, pricing, and missing features

The dominant concern across Reddit and Discord is session cost unpredictability — idle sessions accumulate cost invisibly without explicit timeout configuration. Early testers also flagged the absence of a built-in session export or portability standard: state is opaque JSON managed by Google, full stop. That sharpens the lock-in critique considerably. It's a real concern, not a theoretical one.

What prominent AI voices highlighted

Independent analysis — including widely shared Medium write-ups — converged on the same point as Google's own documentation: the six-month preview window directly shaped the GA release, and the stable schema was the top developer-requested feature. The named authors of the announcement, Ali Çevik (Group Product Manager, Google DeepMind) and Philipp Schmid (Developer Relations Engineer, Google DeepMind), frame the GA as ecosystem standardization, not just a feature drop. Read the primary Google DeepMind announcement for the canonical wording.

What Comes Next: Roadmap, Predictions, and Strategic Implications

Confirmed next features and roadmap signals

Google explicitly flagged Gemini Omni as 'soon.' The Gemini 3 parameter expansion — level of thinking, latency controls, multimodal fidelity — signals continued API-surface growth. Expect streaming state updates and webhook-based async callbacks in the next major version. Those are the natural extensions of background execution once developer adoption pressure builds.

Bold predictions for the next 12 months

2026 Q3


  **OpenAI ships a direct functional equivalent**

A stateful unified endpoint is now table stakes for tier-1 providers; OpenAI's existing Assistants/Responses groundwork makes a unified multimodal + combined-tool response the obvious next move.

2026 H2


  **LangGraph and CrewAI reposition as Interactions-API-native**

With state and tool orchestration absorbed into the API, framework value moves to coordination and evals — repositioning within ~6 months is the rational response.

2027 H1


  **Session portability becomes a competitive battleground**

The opaque-state lock-in critique pressures providers toward a serialization standard — expect early proposals for portable agent-session formats, possibly via the MCP ecosystem.

What developers should do right now

Audit all GenerateContent usage, classify by statefulness requirement, and migrate multi-turn and agentic workloads first while leaving high-volume stateless inference where it is. For workflow-heavy teams, prototype the Interactions API behind your existing workflow automation and AI agents layers before committing live sessions. Don't big-bang the migration. To experiment fast, explore our AI agent library for stateful-agent starting points.

The migration play: classify GenerateContent usage by statefulness, move multi-turn and agentic workloads to the Interactions API first, leave stateless high-volume inference in place.

~40%

Estimated infrastructure-overhead reduction for session-persistent AI SaaS

Analysis based on Google, 2026

10s+

Task duration threshold where background execution pays off

Google AI for Developers

3

Documented sweet spots: customer service, code-gen, research agents

Google, 2026

Frequently Asked Questions

What is the Google Interactions API and how is it different from the GenerateContent API?

The Interactions API is Google's primary, unified endpoint for both Gemini models and agents, announced GA on June 23, 2026. The core difference is server-side state: GenerateContent is stateless, so your client re-sends the entire conversation history every turn, while the Interactions API holds context on Google's servers. You pass a model ID for inference, an agent ID for autonomous tasks, and set background=True for long-running work — all through one schema that also handles text, image, audio, video and documents. It additionally supports Managed Agents, combined tool use and native MCP. Use GenerateContent for single-turn, stateless, high-volume inference; use the Interactions API for multi-turn conversations and agentic workloads.

When did the Interactions API reach general availability and what changed from preview?

The Interactions API reached general availability on June 23, 2026, per the official Google Keyword blog, after launching in public beta in December 2025. The biggest change is a stable schema — Google's implicit no-breaking-changes contract. GA also shipped major developer-requested features simultaneously: Managed Agents (a single call provisions a Linux sandbox with the Antigravity agent as default), background execution via background=True, and combined built-in tool use. Gemini Omni was flagged as coming 'soon.' Critically, all Gemini documentation now defaults to the Interactions API, and Google is working with ecosystem partners to make it the default across third-party SDKs and libraries — the strongest signal yet that GenerateContent is the legacy path for agentic use cases.

How does server-side state work in the Interactions API and what does it cost?

Server-side state means Google holds your conversation context and agent execution state on its infrastructure, so you reference a session rather than re-uploading history each turn. This eliminates client-side history management and stops you re-paying input tokens for the entire transcript every call. On cost: pricing splits across Google AI for Developers (pay-as-you-go, with a limited free tier) and Vertex AI (enterprise/committed use). The new dimension to watch is that state is billed against active sessions, so idle sessions accumulate cost invisibly if you never set timeouts — the top community complaint at GA. Always set explicit session timeouts, close sessions on conversation end, and check the live Gemini API pricing page for current per-session and per-token rates.

Can I use the Interactions API with LangGraph, CrewAI, or other orchestration frameworks?

Yes. The Interactions API is a primitive, not a replacement for orchestration frameworks. LangGraph and CrewAI can call it as their underlying model/agent endpoint, and AutoGen multi-agent conversations can use it as a building block. Google's own Agent Development Kit (ADK) already calls the Interactions API internally. What changes is the division of labor: the API absorbs server-side state management and tool orchestration that these frameworks previously provided, so their value shifts up the stack to multi-agent coordination, evaluation and human-in-the-loop control. Low-code tools like n8n can invoke the API over REST, giving no-code agents a stable stateful backend. Expect frameworks to reposition as Interactions-API-native rather than competing on state.

What are Managed Agents in the Gemini API and how do I deploy one?

Managed Agents are a GA feature where a single API call provisions a remote Linux sandbox in which an agent can reason, execute code, browse the web and manage files — with Google owning the runtime so you skip agent-hosting infrastructure. The Antigravity agent ships as the default reference implementation. To deploy one, pass an agent ID (e.g. antigravity) instead of a model ID, supply your task as input, and optionally set background=True for long-running work; you then poll the returned job handle for results and any files produced. To customize, define your own agent with instructions, skills and data sources — the call shape stays identical. Managed Agents differ from self-hosted LangGraph or CrewAI agents in that Google controls the sandbox, code execution and file system.

Does the Interactions API replace the need for vector databases and RAG pipelines?

No. Server-side state and RAG solve different problems. The Interactions API's state holds in-flight conversation and agent context — short-term continuity within a session. RAG with vector databases like Pinecone, Weaviate or Chroma provides durable, queryable long-term knowledge retrieval across millions of documents that no session-state mechanism replaces. Dropping your vector database because 'Gemini remembers now' is a common and costly misread of the Statefulness Threshold. The right architecture keeps both: use the Interactions API for persistent conversation and agent execution, and keep your RAG pipeline for grounding answers in enterprise knowledge. Note also that if your RAG pipeline rebuilds context per query, you may not benefit from server-side state for those specific calls.

How does Google's Interactions API compare to OpenAI's Assistants API and Anthropic's tool use?

OpenAI's Assistants API pioneered server-side threads and async runs back in 2023, so the stateful concept is not new. The Interactions API's edge is a single unified endpoint serving both models and agents, with native multimodal generation and combined tool use — grounding, code execution and function calls — in one call, plus a one-call Managed Agent sandbox. Anthropic introduced MCP as neutral, cross-vendor infrastructure; Google natively supporting MCP in the Interactions API strengthens the protocol while diluting single-vendor ownership of it. The trade-off across all three is lock-in: server-side state is provider-hosted and, in Google's case, opaque JSON with no standardized export today, so migrating live sessions between providers is non-trivial. Choose based on your multimodal and combined-tool needs.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder who has spent years designing autonomous workflows, multi-agent architectures, and AI-powered business tools. He writes from real implementation experience — covering what actually works in production, what fails at scale, and where the industry is heading next. His work focuses on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile

This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.