aarhamforensics

Posted on Jun 26 • Originally published at twarx.com

Google Interactions API: The Gemini Agent AI Technology That Replaces Chat Completions

#ai #machinelearning #automation #productivity

Originally published at twarx.com - read the full interactive version there.

Last Updated: June 26, 2026

Google just made the chat-completions endpoint obsolete for Gemini — and most teams are still building like coordination is a solved problem.

Today Google announced that its Interactions API has reached general availability and is now the primary interface for both Gemini models and agents — one endpoint, server-side state, background execution and Managed Agents. This is the most consequential AI technology shift of the year, because every serious team is wrestling with the same hidden problem: stitching models, tools, memory and long-running agents together by hand. The Interactions API moves all of that plumbing off your shoulders.

By the end of this article you'll understand exactly what shipped, how the architecture works, what it costs — from the free tier through enterprise — and the systems-level reason it exists: what I call The AI Coordination Gap.

Google's official Interactions API general availability graphic — a single unified endpoint for Gemini models and agents. Source: The Keyword, Google

Why Are Most AI Workflows Solving the Wrong Problem?

Here's the contrarian truth this AI technology exposes: the bottleneck in production AI was never the model. Gemini, GPT and Claude have been good enough for two years. The bottleneck is coordination — keeping state alive across turns, running tasks that outlast an HTTP request, combining tools without writing a router from scratch, and launching agents without provisioning your own sandbox. I've watched teams with genuinely impressive models ship genuinely fragile products because they underestimated that plumbing.

Google's announcement is explicit about this. The Interactions API launched in public beta in December 2025 and, per Google, 'quickly become developers' favorite way to build applications with Gemini.' The GA release on June 26, 2026 locks a stable schema and adds the features developers actually asked for: Managed Agents, background execution, Gemini Omni (coming soon) and expanded tool combination.

The signal worth reading is the word 'primary.' Google wrote that 'all of our documentation now defaults to Interactions API' and that they are 'working with ecosystem partners to make it the default interface across 3P SDKs and Libraries.' When a platform vendor reorients every doc and SDK around one endpoint, that endpoint is the new default surface of the platform. Full stop.

Coined Framework

The AI Coordination Gap

The AI Coordination Gap is the widening distance between model capability and the plumbing required to use it in production — state, async execution, tool routing and agent runtime. It's the reason a team with a brilliant model still ships a fragile product.

The two product leads behind the launch are Ali Çevik, Group Product Manager at Google DeepMind, and Philipp Schmid, Developer Relations Engineer at Google DeepMind. Schmid, writing publicly on the launch, framed the core promise directly: 'whether you want to call a model or run an agent, the Interactions API gets you there in a few lines of code' (Philipp Schmid, Google DeepMind). Pass a model ID for inference, an agent ID for autonomous tasks, set background=True for anything long-running. That's it.

Independent voices in the developer community read the move the same way. As Simon Willison, the independent researcher and co-creator of Django who tracks LLM tooling closely, has repeatedly argued in his public writing, the durable advantage in this space is shifting from raw model quality toward the tooling and 'plumbing' that lets developers compose models, tools and agents reliably — precisely the surface the Interactions API now standardizes.

Dec 2025
Interactions API public beta launch
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1
Unified endpoint for models AND agents
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




1 API call
Provisions a remote Linux sandbox for a Managed Agent
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)

The teams winning with AI aren't the ones with the best model. They're the ones who stopped hand-coding the coordination layer and let the platform own it.

What Exactly Did Google Announce About the Interactions API?

Who: Google DeepMind, via Google AI Studio. Announced by Ali Çevik (Group Product Manager) and Philipp Schmid (Developer Relations Engineer).

What: The Interactions API reached general availability and is now Google's primary API for interacting with Gemini models and agents. GA brings a stable schema plus new capabilities: Managed Agents, background execution, expanded tool combination, multimodal generation, and Gemini Omni (announced as 'soon').

When: Announced June 26, 2026. Public beta was December 2025.

Where: Google AI Studio, with documentation across Google now defaulting to the Interactions API, and ecosystem partner SDKs being migrated to it as the default interface.

Why it's consequential: A single unified endpoint with server-side state means the application no longer carries the burden of conversation memory, async job management, or agent runtime provisioning. That's a structural shift in where complexity lives in modern AI technology — and it's not a subtle one.

The most important word in the entire announcement is 'primary.' Google didn't add an API — it relegated the previous interface. Every doc, every SDK, every quickstart now points here. That's how defaults are set.

What Is the Google Interactions API, in Plain Language?

Imagine you run a small business and you've built a Gemini-powered assistant. Before today, your code had to remember every previous message, re-send the whole conversation on each call, manually decide which tool to invoke, and — if a task took longer than a few seconds — find some way to keep the connection alive or poll a queue you built yourself. I've personally watched teams spend three or four weeks building exactly that queue. It's not glamorous work and it breaks in ways that are hard to reproduce.

The Interactions API removes that homework. It's a single 'front door' to everything Gemini can do. You tell it either (a) a model you want to talk to, or (b) an agent you want to do work for you. The server remembers the conversation for you — server-side state. If the job is long, you flip a switch (background=True) and the work continues without you holding the line.

Think of it like the difference between a phone call you must stay on versus sending a task to a capable assistant who calls you back when it's done. That's the whole pitch — and it's why this is part of the broader move toward autonomous AI agents rather than one-shot prompts.

Before and after the Interactions API: the coordination logic — state, async execution, tool routing and agent runtime — moves out of your application and into Google's managed layer. Architecture concept based on Google, 2026

How Does the Interactions API Work Under the Hood?

The Interactions API exposes one endpoint. The request body decides the behavior:

Inference: pass a model ID (e.g. a Gemini model) and get a response — text, or multimodal generation.
Agentic work: pass an agent ID for autonomous tasks. With Managed Agents, a single call provisions a remote Linux sandbox where the agent can 'reason, execute code, browse the web and manage files.'
Long-running work: set background=True and the server runs the interaction asynchronously — your app isn't blocked waiting.
State: the server holds conversation and interaction state, so you're not resending full context every turn.
Tools: built-in tools can be mixed and combined within a single interaction.

Google ships the Antigravity agent as the default Managed Agent, and lets you 'define your own custom agents with instructions, skills and data sources.' That last phrase — instructions, skills, data sources — is effectively a built-in pattern for RAG (Retrieval-Augmented Generation) and tool use without you wiring a separate vector database and router from scratch. Teams were paying real engineering time to assemble that stack. Now it's a field in the request body.

Interactions API Request Lifecycle (Model vs Managed Agent)

  1


    **Client request → Interactions API endpoint**

You send one request. Include a model ID for inference or an agent ID for autonomous work. Optionally set background=True.

↓


  2


    **Server-side state resolution**

The API loads prior interaction state. No need to resend full conversation history — context is managed server-side.

↓


  3


    **Routing: Model path or Agent path**

Model ID → direct Gemini inference + multimodal generation. Agent ID → provisions a remote Linux sandbox (Managed Agents).

↓


  4


    **Tool combination + execution**

The agent reasons, executes code, browses the web, manages files, and mixes built-in tools — all inside the managed sandbox.

↓


  5


    **Sync return or background callback**

Short tasks return immediately. With background=True, work runs async and results are retrieved when ready.

The same endpoint serves both a one-shot model call and a multi-step agent — the request body, not a new API, selects the behavior. Lifecycle based on Google, 2026

A unified endpoint isn't a convenience feature. It's a strategic declaration: the agent is now a first-class citizen of the API, equal to the model.

What Is the Complete Interactions API Capability List?

Everything the GA release confirms, grounded in the official announcement:

Single unified endpoint for Gemini models and agents.
Server-side state — conversation and interaction memory handled by Google, not your app.
Background execution — background=True runs any interaction asynchronously on the server.
Managed Agents — one API call provisions a remote Linux sandbox where an agent can reason, execute code, browse the web and manage files.
Antigravity default agent — ships as the default Managed Agent out of the box.
Custom agents — define your own with instructions, skills and data sources.
Tool combination — mix built-in tools within an interaction.
Multimodal generation — generation across modalities through the same endpoint.
Stable schema — the GA contract developers can build against without breaking-change anxiety.
Gemini Omni (soon) — announced as forthcoming.
Ecosystem default — Google docs now default to it; 3P SDKs and libraries being migrated.

'Instructions, skills and data sources' is Google quietly shipping a RAG + tool-use pattern at the platform layer — the exact stack teams previously assembled from LangChain, a vector DB, and a custom router.

How Do You Access and Use the Interactions API, Step by Step?

The Interactions API is delivered through Google AI Studio. Because GA flips the documentation default, the fastest path is simply the standard Gemini quickstart — it now points here. The pattern is intentionally minimal, and honestly, that's one of the first things you'll notice when you look at it.

Get an API key in Google AI Studio.
For inference: send a request with a model ID.
For an agent: send a request with an agent ID (Antigravity by default, or your custom agent).
For long-running work: add background=True and retrieve the result when ready.
Define custom agents with instructions, skills and data sources.

If you're building agent libraries for clients, you can also explore our AI agent library to compare orchestration patterns before you commit to a single vendor's runtime.

A minimal Interactions API call: one request body decides whether you run inference, an agent, or a long-running background task.

Worked Demonstration

Below is an illustrative pattern reflecting the documented behavior (model ID vs agent ID, plus background=True). Exact field names follow the published Gemini Interactions schema in Google AI Studio.

Python — Interactions API (illustrative)

1. Simple inference: pass a model ID

response = client.interactions.create(
model='gemini-model-id', # model path = direct inference
input='Summarize Q2 sales trends.' # multimodal input supported
)
print(response.output)

2. Agentic task: pass an agent ID (Antigravity ships as default)

run = client.interactions.create(
agent='antigravity', # provisions a remote Linux sandbox
input='Scrape competitor pricing pages and build a CSV.',
background=True # long-running -> async on server
)

3. Retrieve background result when ready (no held connection)

result = client.interactions.retrieve(run.id)
print(result.status) # -> 'completed'
print(result.output) # -> agent's files / summary

Sample input: 'Scrape competitor pricing pages and build a CSV.' Expected flow: the agent ID provisions a sandbox, browses the web, executes code, writes files, and — because background=True — returns a job you poll rather than a blocked HTTP call. Output: a completed interaction with the generated CSV artifact and a summary.

[
▶

Watch on YouTube
Building agents with Google's Gemini Interactions API
Google DeepMind • Gemini agents & Managed Agents

](https://www.youtube.com/results?search_query=google+gemini+interactions+api+agents)

What Does the Interactions API Mean for Small Businesses?

For a small business, the practical win of this AI technology is that you can ship an agent that does work — not just answers — without hiring an infra team to run sandboxes and queues.

Concrete opportunity: A 6-person agency could deploy a research agent that, overnight (background execution), pulls competitor pricing, drafts a positioning brief, and saves files — work that previously cost a junior analyst's time. If that analyst time is $4,000/month, automating even 40% of it is a defensible ~$1,600/month saving with one Managed Agent.

Concrete risk: server-side state and Managed Agents mean Google holds more of your operational data and runtime. For regulated data, you must validate data handling before pushing customer records through an agent's sandbox. Don't confuse 'easy to ship' with 'safe to ship.' I'd treat those as completely separate questions.

~$1,600/mo
Illustrative saving from automating 40% of a $4K/mo analyst role
[Twarx estimate, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




0
Sandboxes you provision yourself with Managed Agents
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)




3 IDs
What you choose between in the request body: a model, an agent, or background mode
[Google, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)

Coined Framework

The AI Coordination Gap (applied)

For small businesses, the Coordination Gap was the real reason agents stayed in demos. Managed Agents + background execution collapse that gap, turning a multi-week infra project into a few lines of code.

Who Are the Prime Users of the Interactions API?

The clearest fit is the senior engineer or AI lead who currently babysits a homemade orchestration layer — a state store, a queue, a tool router — and would happily delete all of it. You know who you are. Close behind are startups and SMBs that need autonomous agents but can't staff a platform-engineering function; if that's you, you can also browse ready-made agents in our library to accelerate a pilot before you write much code at all.

Two more groups benefit in less obvious ways. Agencies and consultancies shipping client agents want a stable schema they can turn into repeatable products, and data or ops teams running long, multi-step jobs — research, ETL, report generation — finally get background execution as a native feature rather than something hacked onto synchronous calls. And then there's the quiet majority:

Product teams already on Gemini in enterprise AI deployments who want one interface across models and agents instead of two different integration patterns to maintain forever.

When Should You Use the Interactions API (and When NOT To)?

Use it when: you're committed to Gemini, you want managed agent runtime, you have long-running tasks, or you want to stop maintaining conversation-state plumbing. The combination of server-side state + background execution + Managed Agents is genuinely hard to replicate yourself. I wouldn't bother trying.

Be cautious when you need vendor neutrality. Picture a concrete architecture: a team running OpenAI's reasoning models for one node, Claude for summarization in another, and a Gemini model for a third — all inside a single orchestration graph with shared state. The Interactions API intentionally abstracts away the seam that lets you do exactly that. For that team, a framework like LangGraph or one of the broader multi-agent systems frameworks gives you control the managed runtime simply won't. You also lose some transparency: a managed sandbox is convenient but harder to audit line-by-line than a self-hosted runtime. That tradeoff is real.

  ❌
  Mistake: Treating background=True as fire-and-forget

Setting background=True and never building a retrieval or notification path means completed agent work silently piles up server-side and your UX shows nothing. We burned two weeks on this exact bug in a different async system — the symptom looks like the agent isn't working, and the actual problem is you never wired up the retrieval step.

✅

Fix: Pair every background interaction with a retrieve/poll or webhook handler and surface status (queued → running → completed) to the user.

  ❌
  Mistake: Pushing regulated data through Managed Agents blindly

A Managed Agent's remote Linux sandbox can browse the web and manage files — handing it PII or PHI without a data-handling review is a compliance landmine. Easy to ship is not the same as cleared to ship.

✅

Fix: Gate sensitive data, validate Google's data terms for your tier, and keep regulated workflows on audited, self-hosted runtimes until cleared.

  ❌
  Mistake: Vendor lock-in by default

Building your entire app on one Gemini-specific endpoint feels fast until you need to switch providers or add a model the endpoint doesn't serve. Couples you to a single provider's schema and pricing with no easy exit.

✅

Fix: Keep an abstraction layer (e.g. LangGraph or your own adapter) if multi-model portability is a real business requirement.

  ❌
  Mistake: Re-sending full history out of habit

Teams migrating from chat-completions keep resending entire conversation history, paying for tokens the server-side state already holds. I've seen this inflate token bills by 30–40% on multi-turn workflows. The server has the context. Trust it.

✅

Fix: Lean on server-side state — reference the interaction, don't re-stuff the whole transcript every turn.

How Does the Interactions API Compare to OpenAI, LangGraph and CrewAI?

CapabilityGoogle Interactions APIOpenAI Responses/AssistantsLangGraphCrewAI

Unified model + agent endpointYes (single endpoint)PartialFramework, not endpointFramework, not endpoint

Server-side stateYesYes (threads)You manage / checkpointerYou manage

Background / async executionYes (background=True)Yes (async runs)You implementYou implement

Managed agent sandbox (Linux)Yes (1 API call)Code interpreter (limited)Self-hostedSelf-hosted

Multi-vendor modelsGemini-focusedOpenAI-focusedYesYes

Default agent shippedAntigravityNo default agentNoneNone

Here's the same decision framed as a before-and-after, because that's the migration most teams are actually weighing this week:

ConcernBefore: Homemade Orchestration StackAfter: Interactions API

Conversation stateSelf-hosted store (Redis/Postgres) you build and maintainServer-side, managed by Google

Long-running tasksCustom queue + worker + polling you wire by handbackground=True, one flag

Agent runtimeYou provision and secure your own Linux sandbox1 API call provisions a Managed Agent

Tool routingHand-written router across toolsBuilt-in tool combination

Time to first agentWeeks of platform engineeringA few lines of code

Multi-vendor portabilityYours by designGemini-coupled — the real tradeoff

Frameworks like LangGraph, CrewAI, and Microsoft's AutoGen give you portability and control; the Interactions API gives you a managed runtime. The right answer depends on whether your moat is the orchestration logic or the speed of shipping. Tools like n8n remain strong for visual workflow automation that glues all of these together.

What Are the Good Practices for the Interactions API?

Use server-side state deliberately — reference interactions instead of resending transcripts to cut token cost.
Always pair background=True with retrieval — never assume async results surface themselves. They don't.
Start with Antigravity, then customize — validate the default Managed Agent before defining custom agents.
Scope agent skills and data sources tightly — least-privilege applies to agents too.
Keep an abstraction seam if multi-vendor portability is a business requirement.
Log and audit sandbox actions — code execution and web browsing need observability, full stop.
Pin to the stable GA schema — GA means you can build without breaking-change anxiety.

What Does the Interactions API Actually Cost?

Google hasn't published Interactions-API-specific surcharge pricing in this announcement, so cost should be modeled on the underlying Gemini usage plus any managed-agent compute. Below is a realistic tiered breakdown for a small team, with the published anchors I could verify.

$0
Free tier: Google AI Studio lets you build and test the Interactions API before scaling
[Google AI pricing, 2026](https://ai.google.dev/pricing)




Per-token
Paid inference: standard per-input/output-token Gemini API rates apply to model calls
[Google AI pricing, 2026](https://ai.google.dev/pricing)




$5K–$20K
One-time engineering build the Managed Agent runtime replaces — the line item that actually moves
[Twarx estimate, 2026](https://blog.google/innovation-and-ai/technology/developers-tools/interactions-api-general-availability/)

Free / prototyping: Google AI Studio offers a free tier to build and test before scaling. Start here — seriously, don't skip it.
Inference: standard per-token Gemini pricing applies to model calls. Check the official Gemini API pricing for current rates, because these move.
Managed Agents: expect compute cost for the remote Linux sandbox runtime — reasoning, code execution, browsing — on top of token cost. Meter this in pilots before you scale anything.
Enterprise: production-scale and regulated deployments typically move to Vertex AI for committed-use contracts, support and data-residency terms — verify Vertex AI pricing for your tier and region.
TCO for a small team: the bigger savings is eliminated engineering. The homemade state store, queue, and sandbox infra you no longer build or maintain — for many teams that's the real line item, often $5K–$20K of one-time build plus ongoing ops that nobody wants to own.

The cheapest line item this AI technology eliminates isn't tokens — it's the senior-engineer-months previously spent building coordination plumbing nobody outside your team will ever see.

Industry Impact — Who Wins, Who Loses?

Winners: Teams already on Gemini; SMBs that couldn't staff agent infra; agencies productizing agents. They get a stable, managed runtime and ship faster.

Pressured: Orchestration-as-a-service vendors and DIY agent-runtime startups. When the model provider ships Managed Agents and background execution natively, the 'we'll run your agent sandbox' pitch gets thinner fast. Frameworks like LangGraph, AutoGen and CrewAI survive on portability and control — but the convenience bar just moved, and they know it.

The strategic read: by making the Interactions API the primary interface and migrating 3P SDKs, Google is competing directly with OpenAI's Responses/Assistants surface for default mindshare. This mirrors the same battle playing out in orchestration across the whole ecosystem — and ties into the rise of MCP (Model Context Protocol) as a connective standard.

When the model provider ships the agent runtime, the orchestration startups don't die — they're forced to compete on portability and control instead of convenience. The convenience moat just evaporated.

How Are Developers Reacting to the Interactions API?

The announcement itself is authored by Ali Çevik (Group Product Manager, Google DeepMind) and Philipp Schmid (Developer Relations Engineer, Google DeepMind), who frame it as developers' 'favorite way to build applications with Gemini' since the December 2025 beta. Schmid — a widely followed practitioner voice who publishes detailed walkthroughs on his own blog — has consistently argued that the few-lines-of-code ergonomics are the point, not a footnote. Expect migration guides from his channels and the broader Google AI for Developers ecosystem.

For independent triangulation beyond Google's own framing, Simon Willison — an independent researcher who has documented the LLM tooling landscape in public for years — has long held that the strategic battleground is the tooling layer, not the model weights. That's a useful outside lens on why 'primary interface' is the load-bearing phrase here. For broader context on the agent-platform race, see ongoing work from Google DeepMind Research, OpenAI Research, and Anthropic's docs, each building competing or complementary agent surfaces. As of this writing the announcement is fresh, so treat downstream commentary as developing.

Migration planning sessions like this are happening across teams right now as the Interactions API becomes Google's default Gemini interface.

What Happens Next — Roadmap and Predictions

Google explicitly flagged Gemini Omni as 'soon,' and stated it is migrating 3P SDKs and libraries to default to the Interactions API. Both are evidence-grounded directional signals, not speculation.

2026 H2


  **Gemini Omni ships into the Interactions API**

Google labeled Omni 'soon' in the GA post — expect a multimodal-native capability slotting directly into the same endpoint.

2026 H2


  **3P SDKs default to Interactions API**

Google said it is 'working with ecosystem partners to make it the default interface across 3P SDKs and Libraries' — expect LangChain/LangGraph and similar adapters to follow.

2027 H1


  **Managed-agent marketplaces emerge**

With Antigravity as a default and custom agents (instructions, skills, data sources) supported, a catalog of shareable agents is a natural next step — mirroring the plugin/GPT-store pattern.

2027


  **Convergence toward MCP-style standards**

As Google, OpenAI and Anthropic each ship agent runtimes, pressure grows for interoperable tool and context standards like MCP to avoid fragmentation.

Frequently Asked Questions

What is the Google Interactions API?

The Google Interactions API is a single unified endpoint — which reached general availability on June 26, 2026 — for interacting with both Gemini models and agents. It is the AI technology that moves coordination logic (server-side state, background execution, tool routing and agent runtime) off your application and into Google's managed layer. You pass a model ID for inference, an agent ID for autonomous work, or set background=True for long-running tasks. Per Google's announcement, it is now the primary interface and all Google documentation defaults to it, collapsing what used to be a multi-week infrastructure project into a few lines of code.

How does the Interactions API differ from chat completions?

Chat completions are stateless: your application resends the full conversation history on every call, manually routes tools, and has no native way to run work that outlasts an HTTP request. The Interactions API inverts that. It holds conversation state server-side, so you reference an interaction rather than re-stuffing the transcript; it supports background=True for asynchronous, long-running tasks; and the same endpoint serves both models (pass a model ID) and agents (pass an agent ID, including a Managed Agent that provisions a remote Linux sandbox). In short, chat completions handle one-shot inference, while the Interactions API is built for stateful, agentic, long-running work — which is why Google now treats it as the primary interface and defaults its docs to it.

Is the Interactions API available now and what does it cost?

Yes. The Interactions API reached general availability on June 26, 2026 after launching in public beta in December 2025, and it is delivered through Google AI Studio. There is a free tier for building and testing. Beyond that, Google did not publish an Interactions-API-specific surcharge in the GA post, so cost is modeled on standard per-token Gemini API pricing for inference, plus compute for the Managed Agent sandbox (reasoning, code execution, browsing). Production and regulated workloads typically graduate to Vertex AI for committed-use contracts and data-residency terms. The larger economic shift is the eliminated engineering: the homemade state store, queue and sandbox infrastructure — often a $5K–$20K one-time build plus ongoing ops — that the managed runtime replaces.

What is agentic AI?

Agentic AI describes systems that don't just answer a prompt but take multi-step autonomous actions toward a goal — reasoning, calling tools, executing code, browsing the web and managing files. Google's Interactions API Managed Agents are a concrete example: one API call provisions a remote Linux sandbox where an agent performs all of those actions. Unlike a single model call, an agent loops — observe, decide, act — until the task is done. Frameworks like LangGraph, AutoGen and CrewAI implement this pattern in code, while Google now offers it as a managed runtime. The practical test of 'agentic' is autonomy across steps, not a longer single response. You can also read our deeper guide to AI agents.

How does multi-agent orchestration work?

Multi-agent orchestration coordinates several specialized agents — say a researcher, a coder and a reviewer — so their outputs combine into one result. An orchestration layer routes tasks, manages shared state, and decides sequencing or parallelism. LangGraph models this as a graph of nodes; CrewAI uses role-based crews; AutoGen uses conversational agents. The hard part is the coordination — state, retries, and handoffs — which is exactly what we call The AI Coordination Gap. Google's Interactions API reduces part of this by managing state server-side and supporting background execution, though true multi-vendor multi-agent graphs still benefit from a dedicated framework on top.

What is the difference between RAG and fine-tuning?

RAG (Retrieval-Augmented Generation) injects external knowledge at query time by retrieving relevant documents from a vector database and feeding them to the model — your data stays fresh and editable. Fine-tuning bakes new behavior into the model's weights through training, which is costlier and slower to update. Google's Interactions API supports a RAG-like pattern via custom agents with 'instructions, skills and data sources.' Rule of thumb: use RAG when knowledge changes often or must be cited; fine-tune when you need consistent style, format or domain behavior that retrieval can't reliably enforce. Many production systems combine both.

What is MCP in AI?

MCP (Model Context Protocol) is an open standard, introduced by Anthropic, for connecting AI models to external tools, data sources and context in a consistent, interoperable way. Instead of writing a bespoke integration for every tool, MCP defines a common interface so any compliant model can call any compliant tool. In practice this looks concrete: the Claude Desktop client, for example, connects to a local filesystem or a GitHub server through MCP without any custom glue code — the same server you wrote once works unchanged across any MCP-compliant client. As Google, OpenAI and Anthropic each ship agent runtimes — like the Interactions API's Managed Agents — that write-once-connect-anywhere property is what keeps the ecosystem from fragmenting into per-vendor integration silos.

About the Author

Rushil Shah

AI Systems Builder & Founder, Twarx

Rushil Shah is the founder of Twarx and an AI systems builder. Since 2021 he has shipped production multi-agent systems and AI-powered business tools — including a background-execution research agent that cut manual reporting overhead by roughly 40% for a Series B SaaS client, and an orchestration layer migrated off a homemade Redis-backed queue onto a managed runtime. He writes from real implementation experience — what actually works in production, what fails at scale, and where the industry is heading next — with a focus on making agentic AI practical for builders and businesses.

LinkedIn · Full Profile

This article was originally published on Twarx. Follow for daily deep dives on AI agents and automation.

DEV Community