Sreejit Pradhan

Posted on May 21

93 Agents. 2.6 Billion Tokens. One Working OS. And a Bill Under $1,000.

#devchallenge #googleiochallenge #ai #architecture

Google I/O Writing Challenge Submission

This is a submission for the Google I/O Writing Challenge

Google I/O 2026 had a lot of shiny objects. Gemini Omni simulating intuitive physics. Smart glasses built with Warby Parker. Gemini Spark acting as a 24/7 background concierge that handles your grocery shopping and your inbox while you sleep.

The consumer tech press is going to spend the next month writing about the glasses. I get it.

But if you actually build software for a living, the moment that should have stopped you cold had nothing to do with hardware. It happened during the Antigravity 2.0 demo, when Google gave an agent a single prompt:

"Build a working operating system from scratch."

Here's what happened over the next 12 hours:

93 sub-agents worked in parallel
15,000+ model requests fired
2.6 billion tokens processed
The agents wrote a scheduler, memory management, and a file system
They wrote their own tests and audited their own code
It booted. It ran Doom.

And the total API cost?

Under $1,000.

That number is the real announcement of I/O 2026. Not the model name. Not the demo. The economics.

Why the $1,000 Number Changes Everything

We've spent the last three years treating token costs as the fundamental constraint on autonomous agents. The math was always the problem. An agent that gets stuck in a debugging loop burns through your monthly budget by Tuesday. That cost ceiling was why production agentic systems were conservative — short context, minimal tool use, human-in-the-loop at every expensive step.

2.6 billion tokens for an OS. Under $1,000.

Gemini 3.5 Flash is publicly stated to be 4x faster than comparable frontier models. Inside the Antigravity harness, Google co-optimized it to run at 12x speed. When intelligence drops in price by this magnitude, brute-force asynchronous generation isn't a party trick anymore. It becomes a legitimate engineering strategy.

The ceiling that shaped how we designed agentic systems just moved.

Model	Approx. cost for 2.6B tokens
GPT-4o (~$2.50/M input)	~$6,500
Claude 3.5 Sonnet (~$3/M input)	~$7,800
Gemini 3.5 Flash at I/O 2026 demo rates	< $1,000

That's a ballpark, not exact accounting — Google hasn't published the precise per-token rate for the Antigravity harness. But the order-of-magnitude difference is real, and it's structural, not a promo credit.

The Actual Primitive: What Managed Agents Gives You

Before I/O, building a production-grade agent meant assembling your own stack. Docker for the execution environment. Networking constraints so the agent doesn't call APIs it shouldn't. State serialization so a multi-hour task survives a process restart. Secret management so your API keys don't end up in the agent's context window. YAML for everything.

That was 2–3 weeks of DevOps before you wrote a single line of agent logic.

Google just killed that entire category of toil with Managed Agents in the Gemini API.

A single API call now gives you:

An agent that reasons, uses tools, and executes code
An isolated, ephemeral Linux sandbox hosted by Google
File management, web browsing, and shell access
Session continuity across calls via interaction_id and environment_id

Here's what that API call actually looks like:

from google import genai

client = genai.Client(api_key="YOUR_API_KEY_HERE")

interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    environment={"type": "remote"},
    input=(
        "Analyze the attached codebase for security vulnerabilities, "
        "write a detailed report, and output it as security-report.md"
    )
)

# The agent's final response
print(interaction.output_text)

# Every file the agent created in the sandbox
for file in interaction.files:
    client.files.download(file.id, destination=f"./{file.name}")

# Every step the agent took — reasoning, tool calls, code execution
for step in interaction.steps:
    print(f"[{step.type}] {step.content[:200]}")

I ran this on a standard Google AI account. Here's what actually happened:

UserWarning: Interactions usage is experimental and may change in future versions.

google.genai._interactions.RateLimitError: Error code: 429
{'error': {'message': 'You do not have enough quota to make this request.'}}

Two things worth noting. First, the UserWarning is honest — this is preview software and Google is telling you that directly in the SDK. Second, the 429 is the real pushback I was already going to make anyway: the Managed Agents API is quota-gated. On a standard account, you hit the wall immediately. AI Ultra ($100/month) is the actual production gate for this feature, not just a suggestion buried in the pricing page.

The code is correct. The API is real. The ceiling is just lower than the keynote demo implies.

How the Architecture Actually Works

This is the part that's easy to miss in the keynote framing. The Managed Agent API isn't a hosted LangChain wrapper. There's a specific design here worth understanding.

Your API call
      │
      ▼
┌──────────────────────────────────────┐
│  Interactions API                    │
│  (session management + routing)      │
└───────────────┬──────────────────────┘
                │
                ▼
┌──────────────────────────────────────┐
│  Antigravity Agent Harness           │
│  (reasoning loop + tool dispatch)    │
│                                      │
│  ┌──────────────────────────────┐    │
│  │  Gemini 3.5 Flash            │    │
│  │  (the reasoning engine)      │    │
│  └──────────────────────────────┘    │
└───────────────┬──────────────────────┘
                │
                ▼
┌──────────────────────────────────────┐
│  gVisor-Isolated Linux Sandbox       │
│  ├── Code execution                  │
│  ├── File system (ephemeral)         │
│  ├── Shell access                    │
│  └── Controlled web browsing         │
└──────────────────────────────────────┘

A month ago at Google Cloud NEXT '26, I wrote that the real story wasn't the Gemini Enterprise Agent Platform — it was GKE Agent Sandbox. Sub-second provisioning. gVisor isolation. 300 sandboxes per second. That was the infrastructure primitive being quietly shipped while everyone was focused on the rebrand.

Yesterday at I/O, we saw the developer API built directly on top of that primitive.

The GKE Agent Sandbox is what makes "single API call, isolated Linux environment" possible without a 30-second cold start. The Managed Agents API is the abstraction layer over it. Google has been shipping this stack in layers for months. I/O 2026 is when the top layer arrived.

The AGENTS.md and SKILL.md Pattern

Here's the detail from the Managed Agents announcement that's getting almost no coverage: agent behavior is defined in versionable Markdown files.

# AGENTS.md — defines agent identity and constraints

## Role
You are a security auditing agent. Your job is to analyze codebases
for vulnerabilities and produce structured reports.

## Constraints
- Never modify files in the /prod directory
- Only read from sources listed in environment.sources
- Always write output to /output/

## Tools available
- code_execution: run Python, Node, Bash
- web_search: look up CVE databases, documentation
- file_management: read/write within sandbox

# SKILL.md — reusable procedures the agent can invoke

## security_scan
1. Run static analysis with bandit (Python) or semgrep (multi-language)
2. Check dependencies against known CVE databases
3. Generate structured findings with severity levels
4. Output to /output/security-report.md

## dependency_audit
1. Parse package.json / requirements.txt / go.mod
2. Check each dependency against OSV database
3. Flag transitive vulnerabilities
4. Output upgrade recommendations

Logan's quote from the Developer Keynote is accurate: "Honestly, it feels like the hottest new programming language is Markdown."

The benefit isn't syntax. It's version control, code review, diffing, and auditability — applied to agent behavior for the first time. Your agent's skill set lives in your git repo. PRs exist for it. Rollbacks exist for it. Your agent's behavior has a commit history.

That's what shifts this from a clever demo to a production primitive.

Where I'd Actually Push Back

I want to be honest about what the 93-agent OS demo doesn't tell you.

Orchestrating 93 parallel agents introduces a terrifying class of state management problems. If Agent 42 is refactoring the memory scheduler while Agent 17 is updating the file system architecture and they introduce a race condition — how do you debug that? Chrome DevTools for Agents helps agents audit their own code. It doesn't help you trace the interactions between dozens of asynchronous actors. That monitoring tooling doesn't fully exist yet.

The sandbox security model has the same gap I flagged at NEXT. gVisor isolates syscalls. It doesn't isolate intent. An agent with valid credentials that makes a destructive call to a production endpoint isn't stopped by the execution sandbox. The answer is fine-grained IAM and network egress controls, but the integration story between sandbox-level networking and agent-level permission scoping is still evolving.

Gemini 3.5 Pro still isn't here. The model designed to act as orchestrator for Flash subagents was delayed to next month. That's the piece that makes the 93-parallel-agent pattern safe to build on. Until Pro ships, you're orchestrating with Flash, which works — but isn't the intended architecture.

And as I found out firsthand — the quota wall is real. Standard accounts hit 429 errors immediately. AI Ultra is the actual entry ticket for anything beyond a toy run.

None of these kill the announcement. They're what you need to know before you plan a sprint around it.

What To Actually Do This Week

Get API access through Google AI Studio. Even if you hit the quota wall like I did, you'll see the SDK structure, understand the step output format, and be ready the moment quota opens up.

When you do get through, start with something concrete — a task that would take you 30 minutes manually:

from google import genai

client = genai.Client(api_key="YOUR_API_KEY_HERE")

interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    environment={"type": "remote"},
    input=(
        "Research the top 5 open-source vector databases by GitHub stars. "
        "Compare them on: license, language, query performance benchmarks, "
        "managed cloud options, and community activity. "
        "Output a comparison table and a one-paragraph recommendation "
        "as research-report.md"
    )
)

print(interaction.output_text)

# This is the part actually worth reading on your first run
for i, step in enumerate(interaction.steps):
    print(f"\n[Step {i+1}] {step.type}")
    print(step.content[:300])

The agent will web search, compare, reason, write, and hand you a file. The point isn't speed — it's that you defined a task in plain English, provided zero infrastructure, and got a deliverable.

Read through interaction.steps carefully on your first few runs. Understanding how the agent breaks down tasks — and where it over-reasons — is how you learn to write better AGENTS.md constraints. The step output is where the actual engineering signal lives.

The Takeaway

I/O 2026 was the conference where the industry stopped building AI wrappers and started building AI infrastructure.

The consumer tech press will write about the glasses. Fine. But the developers who spend this week understanding how to call the Managed Agents API, define SKILL.md files for their domain, and instrument the step output are the ones who ship production agentic systems in June — while everyone else is still configuring Docker execution environments.

The barrier to deploying a stateful, sandboxed, multi-hour autonomous agent went from three weeks of DevOps to one API call and a Markdown file.

The quota wall is real. The preview label is honest. The primitive underneath it is not going away.

That's the real headline.

Have you tried the Managed Agents API yet? And for anyone who's shipped real agentic systems before — what's the debugging story you're most nervous about at 93-agent scale? Drop it in the comments.

Written by me after watching the I/O developer keynote and reading the official Managed Agents quickstart docs at [ai.google.dev]-huh! took a hell of a time but nonetheless i enjoyed it.(https://ai.google.dev/gemini-api/docs/managed-agents-quickstart). Code samples derived from official Google documentation. I ran the API myself — that 429 error is real.