Balaka Biswas

Posted on May 25

Google I/O 2026’s Smartest Developer Release Wasn’t a Model, It Was the Runtime - Managed Agents in Gemini API

#devchallenge #googleiochallenge #ai #programming

Google I/O Writing Challenge Submission

This is a submission for the Google I/O Writing Challenge

Google I/O 2026’s Smartest Developer Release Wasn’t a Model. It Was the Runtime.

Every Google I/O has its headline magnet. A faster model, a shinier demo, a new capability that makes developers excited, nervous, or both. Google I/O 2026 had plenty of those moments. Gemini 3.5 Flash came with serious benchmark energy. WebMCP gave the open web crowd something ambitious to debate. AI Studio, Chrome, Search, and Gemini all moved deeper into agentic territory.

But the most important developer announcement was not the loudest one.

It was the Managed Agents in the Gemini API.

That may sound less glamorous than a new model, but that's also exactly why it matters. Models are the engines, while Managed Agents are the chassis, gearbox, dashboard, pit crew, and the emergency brake. It is the layer that turns “the model can reason and use tools” into “my application can ask an agent to do useful work, observe what it did, preserve state, collect artifacts, and continue from there.”

That is a very different product. And for developers, it may be the more important one.

The Real Bottleneck was Never Intelligence

For the last couple of years, agent demos have followed a mundane script:

A model receives a task.
It calls the required tools.
It plans and writes code.
Runs the code and inspects the result.
Fixes its own mistake.

Everyone nods. But then, a developer tries to build the same thing in production and immediately runs into the actual problem.

The hard part is not only getting the model to think. The hard part is giving it a place to work. A serious agent needs a runtime. It needs a sandbox, files, tool boundaries, memory or state. It needs observable intermediate steps and controls for network access, credentials, cost, and cleanup, not to add developer ergonomics that do not require every team to rebuild the same orchestration layer from scratch.

This is the gap Managed Agents tries to close.

Google’s announcement is not plainly: “Gemini can use tools.” The more interesting claim is this: Google is packaging the agent loop itself as a managed developer primitive.

With Managed Agents, the Antigravity managed agent can run inside a Google-hosted Linux environment, execute code, manage files, use web access, preserve environment state, and return observable execution traces through the Interactions API. That shifts the developer’s job.

Instead of building the whole runtime yourself, you can start from a hosted agent environment and focus on the product boundary around it.

That boundary is where the real engineering begins.

What Google Shipped

At I/O 2026, Google introduced Managed Agents in the Gemini API, with the Antigravity agent available as a public preview. The agent is powered by Gemini 3.5 Flash and exposed through the Interactions API and Google AI Studio.

The product has a few key parts:

Component	What it does
Gemini API	The developer API surface for Google’s models and agents
Interactions API	The API layer built for stateful, agentic, multi-turn workflows
Antigravity managed agent	Google’s hosted general-purpose agent harness
Remote Linux environment	A sandbox where the agent can execute code and manage files
Environment ID	A handle that lets later calls continue in the same workspace
Interaction ID	A handle for continuing conversational state
AI Studio Agents Playground	A visual way to prototype agent behavior
Custom agents	Reusable agent configurations with instructions, sources, and environment settings

The important part is that this is not a single stateless prompt-response API. A stateless call is good for generation, classification, extraction, summarization, and one-shot reasoning. A managed agent is better suited for work that requires state, tools, files, and iteration.

Think data analysis, repository auditing, research synthesis, report generation, benchmark runs, documentation updates, or internal workflow automation. Which is why, Managed Agents is more than another AI feature. It is closer to an execution substrate.

The Architecture: Prompts Go In, Work Comes Out

A simplified Managed Agents flow looks like this:

The developer sends a task through the Interactions API. The managed agent receives it, reasons through the task, uses available tools, reads or writes files inside the remote environment, and returns both the final output and structured information about execution.

The key is state.

Google gives developers two major handles:

Handle	Purpose
`previous_interaction_id`	Continue the conversation
`environment_id`	Continue working in the same sandbox

That second handle is especially important. Without environment persistence, every agent task becomes a one-shot performance. With environment persistence, the agent can build on previous files and results.

Turn one can create an analysis. Turn two can improve the chart. Turn three can package the output. Turn four can audit the final files.

That feels less like prompting a chatbot and more like supervising a remote worker with a shell.

A Minimal API Pattern

The cleanest mental model is:

Concept	Mental model
Interaction	The conversation and reasoning state
Environment	The working directory and execution state
Agent	The policy and tool-using worker
Artifact	The files created by the work
Step trace	The observable record of what happened

A basic Python workflow could look like this:

from google import genai

client = genai.Client()

first_run = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input=(
        "Read revenue.csv, identify the top three trends, "
        "and save a short report as report.md."
    ),
    environment="remote",
)

print(first_run.output_text)
print(first_run.environment_id)

second_run = client.interactions.create(
    agent="antigravity-preview-05-2026",
    previous_interaction_id=first_run.id,
    environment=first_run.environment_id,
    input=(
        "Now create a chart for the strongest trend "
        "and save it as chart.png."
    ),
)

print(second_run.output_text)

The developer did not manually create a container, pass files between steps, write a tool router, manage the execution loop, or build a step logger. The managed runtime absorbs much of that scaffolding.

That is the product insight. Google is not only offering model intelligence. It is offering the operating context around that intelligence.

Why the Interactions API Matters

The Interactions API is one of the most important parts of this launch because it signals how Google expects developers to build with Gemini going forward.

Older model APIs are shaped around a single call: send content, get content. That works for many use cases. But agentic workflows need more structure. They need server-side state, tool traces, intermediate events, resumability, and file continuity.

Consider a data workflow: A user uploads three CSV files and asks for a short analysis --> The agent writes a script, runs it, creates a plot, and writes a markdown summary --> Then the user says, “Actually, split this by region and add a table.”

In a stateless setup, you either replay everything into context or manually store and reload outputs.

With Managed Agents, you continue from the prior interaction and reuse the same environment. The files are already there. The agent can inspect them again. The workflow becomes less like prompt engineering and more like a remote analytical session.

Custom Agents: From Prompt to Reusable Worker

Managed Agents are useful as one-off calls, but the more production-relevant pattern is creating reusable agents with stable instructions, sources, and environment controls. A repo auditing agent, for example, should not need a giant prompt every time. It should have a defined role, defined workspace, and clear output expectations.

A simplified setup might look like this:

from google import genai

client = genai.Client()

agent = client.agents.create(
    id="repo-auditor",
    base_agent="antigravity-preview-05-2026",
    system_instruction=(
        "Audit the repository for test failures, dependency issues, "
        "and risky code patterns. Write findings to "
        "/workspace/output/report.md."
    ),
    base_environment={
        "type": "remote",
        "sources": [
            {
                "type": "repository",
                "source": "https://github.com/your-org/your-repo",
                "target": "/workspace/repo",
            }
        ],
        "network": {
            "allowlist": [
                {"domain": "api.github.com"},
                {"domain": "pypi.org"},
            ]
        },
    },
)

This is where Managed Agents start to look less like “chat with tools” and more like infrastructure.

You can imagine teams defining internal agents such as:

Agent	Purpose
`data-report-agent`	Turn CSVs into charts and summaries
`repo-auditor`	Review a codebase and write findings
`release-note-agent`	Compare commits and draft release notes
`benchmark-agent`	Run evaluation scripts and summarize metric changes
`doc-update-agent`	Propose documentation changes from source updates

The important engineering move is repeatability.

A useful agent should not depend on a perfect prompt typed by a tired developer at 1:13 AM. It should have persistent instructions, restricted access, stable output paths, and behavior that can be reviewed.

That is the difference between a demo and a product.

Filesystem-Native Configuration is a Bigger Deal Than it Sounds

One detail that I like is the support for instruction files such as AGENTS.md and skill files such as SKILL.md.

Now why is that a huge thing?

Developers already understand files. Repositories already have conventions. Teams already review documentation, configuration, and scripts in pull requests. Putting agent behavior into files makes that behavior easier to inspect, diff, review, and version.

A repository might look like this:

repo/
  AGENTS.md
  .agents/
    skills/
      audit-tests/
        SKILL.md
      summarize-changes/
        SKILL.md
  src/
  tests/
  package.json

That is a smart direction because it makes agent behavior part of the software project, not an invisible prompt hidden inside a product dashboard.

A team can review:

Question	Why it matters
What is the agent allowed to do?	Defines operational boundaries
What files can it inspect?	Controls scope
What outputs should it produce?	Improves repeatability
What external domains can it access?	Reduces leakage risk
What skills does it use?	Makes behavior easier to audit

This is the kind of technical detail that decides whether agents become production tools or remain conference magic tricks.

Observability: Because “Trust Me Bro” is Not a Log Format

An agent that runs code, reads files, searches the web, and creates artifacts cannot be a black box. Developers need to know what happened. Not in a vague “the agent analyzed your data” way. They need step traces. They need to inspect tool calls. They need to see what files were touched, what commands ran, what sources were consulted, and where the process failed.

Agent observability has three jobs:

Job	Why it matters
Debugging	You need to know where the process went wrong
Trust	Users are more likely to accept output when they can inspect the path
Governance	Teams need records for security, compliance, and review

This is another reason the Interactions API matters. Agentic applications are not only about final text, they are about the work behind the text. A good platform needs to expose that work.

Pricing and Control

Managed Agents is useful, but agentic workflows can spend tokens quickly. A normal chat call is usually bounded by input and output. An agent run may include planning, tool calls, file inspection, code execution, error recovery, generated artifacts, and multiple rounds of iteration.

That means cost control is a product requirement, not an accounting afterthought.

A real integration should include:

Control	Why it helps
Narrow task scopes	Prevents sprawling behavior
Budget limits	Stops runaway usage
Streaming visibility	Lets users cancel bad runs early
Clear stop conditions	Reduces unnecessary iteration
Human approval gates	Protects sensitive actions
Environment cleanup	Avoids stale or risky artifacts

Security: A Useful Power Still Needs a Fence

Managed Agents is exciting because it gives the model a place to act. That is also why it deserves caution. An agent that can read private files, process untrusted content, browse the web, and call tools has a real attack surface. The risky combination is:

Access to private data
Exposure to untrusted instructions or content
Ability to communicate externally or take actions

That combination can create prompt injection, data exfiltration, and tool misuse risks. A safer architecture should wrap the managed agent in policy checks, scoped files, network allowlists, human review, and audit logs.

A practical rule: treat the agent like a junior engineer with shell access.

Useful? Absolutely. Unsupervised in production? Please do not make your incident report write itself.

What Needs To Improve

Managed Agents is still a preview product, and the limitations matter.

Current constraints include preview API stability, limited tool support in some areas, no structured outputs for the Antigravity agent, no MCP support for this agent yet, no background execution for Antigravity, and limited multimodal input coverage.

The biggest gap for many developers is structured output. If an agent produces artifacts for humans, markdown is fine. If it feeds another system, developers often need strict schemas.

A more mature production version should improve:

Feature	Why it matters
Structured outputs	Safer system-to-system integration
Job controls	Better cancellation, retries, and background runs
Policy controls	Stronger file, tool, and network permissions
MCP support	Better tool ecosystem interoperability
Evaluation hooks	Easier testing before deployment

That would move Managed Agents from promising preview to serious default runtime.

Final Verdict

Gemini 3.5 Flash gives Google a stronger engine, and WebMCP hints at a more agent-readable web. But Managed Agents gives developers the layer they actually need to turn model intelligence into product behavior: a runtime.

That runtime can execute code, handle files, preserve state, expose steps, and produce artifacts. It also forces serious questions about security, cost, observability, and control. That is exactly why it is interesting. The future of agentic software will not be won only by the smartest model; it will be won by the platform that makes smart models useful, inspectable, constrained, and economically sane.

So yes, enjoy the flashy demos. Watch the model benchmarks. Argue about whether the web needs WebMCP. But if you are a developer deciding what to build after I/O, pay close attention to the less sparkly runtime announcement.

That is usually where the future hides.