Pratay Karali

Posted on May 21

The Day AI Became Its Own CTO: Antigravity 2.0 and the 12-Hour OS

#googleiochallenge #antigravity #ai #devops

Google I/O Writing Challenge Submission

What happens when you stop giving AI a task — and give it a company?

There's a moment in every science fiction film where the machine stops waiting for instructions.

At Google I/O 2026, that moment happened live on stage — and it didn't feel like science fiction. It felt like watching the future quietly clock in for work.

Antigravity 2.0 was given a single directive: build an operating system. No team. No standups. No Jira tickets. Just one primary agent, 93 subagents it spun up itself, 15,000+ model requests, 2.6 billion tokens generated, and 12 hours on the clock.

The total bill? Under $1,000.

The result? A working OS — that, when it failed to run Doom due to missing keyboard and video drivers, diagnosed the problem and wrote the drivers live on stage.

I've been staring at that moment for two days. Let me tell you what I think it actually means.

The Jarvis Architecture: Corporate Hierarchy Without the Politics

Here's the frame that won't leave my head: this isn't just "AI coding." This is AI operating like a corporation — and it's a corporation unlike any that has ever existed.

The primary Antigravity agent functions as a CTO. It doesn't write every line of code. It understands the system, breaks the goal into domains, and spawns specialized subagents — one for the database layer, one for the frontend, one for testing, one for drivers. Each subagent works in an isolated workspace, reports summarized results back, and dissolves when its job is done.

         +--------------------------+
         |      Primary Agent       |
         |  (Context Coordinator)   |
         +----+------+----------+---+
              |      |          |
  +-----------+      |          +-----------+
  | Spawns           | Spawns               | Spawns
  v                  v                      v
+----------+    +----------+          +----------+
| Subagent |    | Subagent |          | Subagent |
| Database |    | Frontend |          |  Testing |
+----------+    +----------+          +----------+

This is the part that breaks my brain a little: nobody fights.

In every human organization I've ever encountered, the frontend team argues with the backend team. The testing team is chronically ignored. The DevOps engineer is always the last person anyone calls and the first person everyone blames. There are ego collisions, misaligned incentives, communication overhead, documentation that's three sprints out of date.

In Antigravity's architecture, the subagents operate in clean isolation. No competing priorities. No meetings about meetings. The primary agent synthesizes their outputs and steers. The goal is the only agenda item.

It's the corporate hierarchy that every management book has been trying to describe for 50 years — and it turns out the only way to actually build it is to use agents that have no sense of self-preservation.

The Sandbox Is the Secret Weapon

What made the 12-hour OS build possible — beyond the model itself — is the infrastructure underneath it: the Managed Agents API and its ephemeral sandbox architecture.

Every Antigravity agent runs inside a Google-hosted Ubuntu Linux container. You don't provision it. You don't configure it. One API call to the Interactions API spins it up — Python 3.12, Node.js 22, a full shell, Google Search, URL context, all ready.

The architecture separates control from execution cleanly:

+------------------------------------------------------------+
|                       CONTROL PLANE                        |
|                        (Agents API)                        |
|  - Register Agent Identity & Constraints                   |
|  - Mount GCS Buckets, Define Network Allowlists            |
+-----------------------------+------------------------------+
                              | Configures
                              v
+-----------------------------+------------------------------+
|                        DATA PLANE                          |
|                    (Interactions API)                      |
|                                                            |
|  +------------------------------------------------------+  |
|  |          Google-Hosted Ubuntu VM Sandbox             |  |
|  |  - Ephemeral Linux Environment                       |  |
|  |  - Python 3.12 & Node.js 22 Runtimes                 |  |
|  |  +----------+  +----------------+  +--------+        |  |
|  |  |   Bash   |  | Google Search  |  |  URL   |        |  |
|  |  | Executor |  |     Tool       |  | Context|        |  |
|  |  +----------+  +----------------+  +--------+        |  |
|  +------------------------------+-----------------------+  |
+---------------------------------|--------------------------+
                                  | Persists across turns
                                  v
+----------------------------------------------------------+
|               PERSISTENT CONTEXT STORAGE                 |
|  - Filesystem & Installed Packages                       |
|  - Conversation History (previous_interaction_id)        |
+----------------------------------------------------------+

The key design insight is state persistence across turns. When you pass previous_interaction_id back into a new Interactions API call, the sandbox doesn't reset. The files the agent created last turn are still there. The packages it installed are still there. The 500k tokens of planning context it built up are still there.

This is what enables long-horizon tasks. A single agent interaction can consume between 300,000 and 3,000,000 tokens — but the platform caches 50–70% of input tokens, making the economics manageable.

Here's how that multi-turn persistence looks in practice:

import base64
from google import genai

client = genai.Client()

# Turn 1: Give the agent its first major task
first_interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="Retrieve the top 10 trends from Hacker News, write them to trends.csv, and generate a matplotlib visualization.",
    environment="remote",
    tools=[
        {"type": "code_execution"},
        {"type": "google_search"},
        {"type": "url_context"}
    ]
)

print(f"Sandbox ID: {first_interaction.environment_id}")

# Turn 2: Continue in the SAME container — trends.csv is still there
second_interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    previous_interaction_id=first_interaction.id,
    environment=first_interaction.environment_id,  # Same container, state preserved
    input="Convert trends.csv into a responsive HTML dashboard."
)

No re-explaining context. No re-uploading files. The agent remembers because the environment remembers.

The CLI: A Deep-Sea Probe for Your Codebase

If Antigravity 2.0 is the CTO, the Antigravity CLI is the ROV they send into the trench.

It's built in Go — lightweight, fast, low overhead — and it can run background tasks asynchronously while you sleep. It doesn't need a visual IDE. It doesn't need a human watching every step. Like an unmanned probe sent into deep water where no one has ever looked, it explores, documents, and surfaces what it finds.

# Install the CLI
curl -fsSL https://antigravity.google/cli/install.sh | bash

# Drop into an interactive agent shell
antigravity-cli

# The commands that make it feel like you hired someone
/goal          # "Complete this without asking me every 5 minutes"
/schedule      # Cron-like automation for recurring tasks
/browser       # Spawn a visual subagent to crawl and test web apps
/rewind        # Undo the last conversation turn and branch differently
/permissions   # Tune autonomy: request-review, always-proceed, strict

The /goal command is the one I keep thinking about. You give the CLI an objective, and it executes — without prompting for step-by-step approvals — until it's done or until it genuinely needs you. This is what "autonomous" actually means in practice. Not just suggesting the next action, but doing the work while you're away from your desk.

The /schedule command extends this further — periodic automated checks, nightly refactor scripts, weekly report generation. This isn't a coding assistant. It's a background process that thinks.

And the deep-sea metaphor holds technically too: unlike browser-based tools, the CLI agents can reach across system boundaries, navigate unknown package ecosystems, probe APIs with no documentation, and surface their findings in structured logs. There are large parts of most codebases that no human fully understands anymore. The CLI goes there.

Behavior as Configuration: AGENTS.md and SKILL.md

One of the underrated announcements from I/O 2026 is how Antigravity handles behavioral customization — not through complex API parameters, but through versioned markdown files in your repository.

Two files you should know:

AGENTS.md lives at the project root. It defines the agent's operating constraints, persona, and global rules. Developed collaboratively by OpenAI, Google, and others under the Linux Foundation's Agentic AI Foundation, it's becoming a universal standard across tens of thousands of repositories — a Dockerfile for agent behavior.

SKILL.md lives at .agents/skills/<skill-name>/SKILL.md. It packages specific capabilities: step-by-step procedures, tool dependencies, reference schemas. Originating from Anthropic, it's now supported across major platforms. The design philosophy is progressive disclosure — the agent reads high-level summaries from AGENTS.md first, then loads specific SKILL.md files only when a matching task appears.

A minimal example:

# .agents/AGENTS.md

## Role
You are a senior DevOps automation agent.

## Non-Negotiable Constraints
- Never run database migrations without human approval.
- All infrastructure changes must test against staging first.
- Validate all generated scripts with linters before execution.

# .agents/skills/docker-builder/SKILL.md
name: docker-builder
description: Automates multi-stage Docker builds and security scans.
tools: [code_execution]

## Build Procedure
1. Locate package.json or requirements.txt in the repository root.
2. Generate an optimized, multi-stage Dockerfile using distroless base images.
3. Execute: docker build -t app-image:latest
4. Run vulnerability scan with Trivy.

This is elegant because it solves a real problem: how do you give an autonomous agent enough context to be useful without flooding its context window with everything all at once? The answer is the same answer good software architecture has always given — load what you need, when you need it.

Your AGENTS.md is the onboarding doc for an employee who never forgets what they read.

The Cost Architecture: What Running 93 Agents Actually Looks Like

Let's talk numbers, because "under $1,000 for an OS" deserves to be unpacked.

Gemini 3.5 Flash — the model powering Antigravity's agent infrastructure — costs $3 to $9 per million output tokens. At 289 tokens/second output speed, it processes roughly four times faster than its nearest benchmarked competitors. The platform cached 50–70% of input tokens across the multi-turn interactions, which is the key cost lever on operations that process millions of tokens.

For the OS build specifically: 2.6 billion tokens total, $1,000 spent. That works out to roughly $0.38 per million tokens effective cost after caching — for 93 parallel agents working across 12 hours.

For context on what different agent task types typically cost:

Task Type	Input Tokens	Output Tokens	Session Cost
Research & Synthesis	100k–500k	10k–40k	$0.30–$1.00
Code & Doc Generation	100k–500k	15k–50k	$0.30–$1.30
Architecture Design	100k–400k	10k–30k	$0.25–$0.80
Large-Scale Data Processing	300k–3M	30k–150k	$0.70–$3.25

The OS build was off the scale of normal tasks — but it demonstrates that the upper limit of what you can accomplish in a single agentic run has expanded dramatically.

What This Actually Changes

I want to be careful here, because a lot of commentary around events like this slides into either hype or dismissal — and neither is honest.

What I saw at I/O 2026 wasn't "AI replacing developers." What I saw was a fundamental shift in the grain of software development.

The analogy that feels right to me: the invention of the compiler didn't eliminate programmers. It changed what programming meant. Before compilers, you managed memory addresses by hand. After, you reasoned about logic and let the tool handle the translation. The skill didn't disappear — it moved up a level of abstraction.

Antigravity 2.0 is another step up that abstraction ladder. You're not writing every function anymore. You're writing AGENTS.md. You're designing the constraint system. You're defining what "done" means and what "never do this" means. You're the architect now — not because coding became less important, but because the agents need someone to tell them what the building is for.

The developers who thrive in this environment won't be the ones who can type the fastest. They'll be the ones who can think most clearly about systems, constraints, and goals — and who can write them down in a way an autonomous agent can actually follow.

That's a different skill. But it's still unmistakably a craft.

The Moment I'll Remember

When Antigravity 2.0's OS failed to boot Doom because it was missing keyboard and video drivers — and then the primary agent, live on stage, diagnosed the gap, spawned new subagents, wrote the missing drivers, and injected them — the audience reaction wasn't the typical polite I/O applause.

There was a moment of genuine silence first.

I think that silence was people recalibrating. Because the agent didn't just complete the task. It encountered unexpected failure in a domain it hadn't been explicitly prepared for, reasoned about the gap, and solved it.

That's not a demo trick. That's the capability.

You can build a demo that looks impressive. You can't fake a system that reasons its way through failure it didn't anticipate.

Where to Start

If you want to explore the Managed Agents API yourself, the entry points are:

Start with a single-turn sandbox — spin up a remote environment via the Interactions API, give it a research + visualization task, and observe the execution loop via SSE streaming.
Add state persistence — on your second interaction, pass previous_interaction_id and environment_id. Watch the agent pick up exactly where it left off.
Write your first AGENTS.md — define three constraints that matter for your project domain. Watch how it changes agent behavior on subsequent runs.
Try a parallel task — give the primary agent a goal complex enough that it spawns subagents. Monitor them via /agents in the CLI. The sandbox persists for 7 days of inactivity before teardown. That's 7 days to run a project you've been putting off.

93 agents, $1,000, one OS. What would you build?

Written for the Google I/O 2026 Writing Challenge on DEV.to. Technical data sourced from the Google I/O 2026 developer keynote and platform documentation.