Romin Irani for Google AI

Posted on Jun 9 • Originally published at Medium on Jun 9

Antigravity Managed Agents Tutorial: Ship Production AI Agents

#agentsasanapi #managedagent #gemini #managedagentsapi

If you’ve tried building AI applications, you often face a familiar engineering wall.

It goes like this: You ask a Large Language Model to help you launch a website. It replies with beautiful code. But then what? The AI can’t actually open a text editor, save the file, or click “Run” to see if it works. You need to do all the heavy lifting.

You might say that you use one of the AI Agents that does a lot more including starting up the server, doing a few tests, etc. Of course they do but in case you had to do that, there are multiple things to consider:

Ensure that you do have a sandbox so the AI can run code safely without accidentally wiping your files.
Stitch together some logic of passing data back and forth across modules.
You need to configure tools, so that the AI can invoke those tools, talk to your data and/or even run a script.

There’s a lot more to think of and it all points towards more operational and management overheard to keep things running safely.

Here’s what this looks like in practice. Say you want an AI to write and test a Python script:

from google import genai
import subprocess

client = genai.Client()

# Step 1: Ask the model for code
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Write a Python script that checks password strength and saves a report."
)

# Step 2: YOU must manually extract the code block from the text response
code = extract_code_from_markdown(response.text) # You wrote this function

# Step 3: YOU must save the file and run it locally
with open("password_checker.py", "w") as f:
    f.write(code)

# ⚠️ Danger: Running untrusted AI-generated code on YOUR machine!
result = subprocess.run(["python3", "password_checker.py"], capture_output=True, text=True)
print(result.stdout)

You’re doing all the work. The extraction. The file I/O. The risky local execution. The error loop when it inevitably fails.

What if you could replace all of this with a single API call?

What Are Managed Agents?

Google’s Managed Agents framework, featuring the flagship Antigravity agent, looks to help you overcome the engineering wall, that we just mentioned.

Think of it as Agent-as-a-Service. Instead of just giving you a text-based chatbot, Google instantly hands your AI its own secure, fully managed cloud computer (a Linux workspace) with the keys to the live internet.

Generated by NotebookLM for this article.

The Three Levels of AI Systems

Let’s look at the following table that highlights the three levels of AI Systems:

The Core Agentic Loop

When you give an Antigravity Agent a task, it doesn’t just guess an answer. It operates inside a continuous, hardware-backed loop:

┌────────────────────────────────────────┐
│ 1. PLAN & REASON │
│ (Analyzes objective & breaks it down) │
└───────────────────┬────────────────────┘
                    │
                    ▼
┌────────────────────────────────────────┐
│ 2. ACT │
│ (Executes a tool: Bash, Python, Web) │
└───────────────────┬────────────────────┘
                    │
                    ▼
┌────────────────────────────────────────┐
│ 3. OBSERVE │
│ (Reads execution output, errors, data) │
└───────────────────┬────────────────────┘
                    │
                    └─── Loop back until task is complete

This isn’t a one-shot prompt/response. The agent iterates. It writes a script, runs it, reads a SyntaxError from the terminal, fixes the bug, re-runs, and keeps going until the task is done.

The Remote Sandbox

Every time you invoke an Antigravity Agent, Google boots up a dedicated workspace:

What you get is a real Linux computer in the cloud that your agent owns for the duration of its task.

Architecture Deep Dive: How It All Works

When working with Managed Agents, you interact with two separate planes via the Gemini API:

                  ┌─────────────────────────────────┐
                  │ YOUR APP │
                  └────────┬───────────────┬────────┘
                           │ │
      1. Define/Configure │ │ 2. Run Task
      (System rules, ID) │ │ (Send Prompt)
                           ▼ ▼
 ┌───────────────────────────────────┐ ┌───────────────────────────────────┐
 │ THE CONTROL PLANE │ │ THE RUNTIME PLANE │
 │ (Agents API) │ │ (Interactions API) │
 ├───────────────────────────────────┤ ├───────────────────────────────────┤
 │ Saves persistent identity, system │ │ Spawns the Ubuntu sandbox, logs │
 │ instructions, and data mounts. │ │ live traces, and processes loops. │
 └───────────────────────────────────┘ └───────────────────────────────────┘

The Control Plane (Agents API) is where you configure and register your agent. Think of it as creating an employee profile i.e. you define its name, system instructions, custom skills, and code repositories to clone. It gives you back a static Agent ID.

The Runtime Plane (Interactions API) is where actual work happens. You send a task to your Agent ID via an “Interaction.” This plane handles sandbox provisioning, tool execution, reasoning traces, and the agentic loop.

Security: The Egress Proxy

Allowing an autonomous agent to run unverified scripts is a security risk. Google isolates the environment using an Egress Proxy Layer :

 ┌─────────────────────────┐
 │ REMOTE UBUNTU SANDBOX │
 │ (No internal secrets) │
 └───────────┬─────────────┘
             │ Agent attempts outbound API call
             ▼
 ┌─────────────────────────┐
 │ EGRESS PROXY LAYER │ ◄── Checks Domain Allowlist
 ├─────────────────────────┤
 │ Intercepts connection, │
 │ injects secrets securely│
 └───────────┬─────────────┘
             │ Safe, Authenticated Request Sent
             ▼
    [External Target API]

Three key guarantees:

Zero Secrets Inside the Sandbox. You never save API keys or database passwords in the agent’s Linux workspace.
Domain Allowlists. You explicitly define which external domains the agent can contact.
Header Transformations. When the agent needs to call an external API, the Egress Proxy intercepts the request, confirms the domain is allowed, and injects the authentication token automatically. The agent gets the data, but never sees your private key.

Getting Started: Your First Agent in 5 Minutes

Let’s create our First Agent using the Managed Agents framework.

Prerequisites

Before writing any code, make sure your environment meets these requirements:

Python: 3.10+. 3.12 recommended (matches sandbox runtime)
google-genai SDK >=2.0.0. The latest at the time of writing is 2.8.0.
API Key —Free tier available at aistudio.google.com/apikey.
OS : Any (macOS, Linux, Windows, or WSL). Your code runs locally, the agent runs in Google’s cloud.

python3 -m venv managed-agents-env
source managed-agents-env/bin/activate # On Windows: managed-agents-env\Scripts\activate
# Install (or upgrade) the Google GenAI SDK
pip install -U google-genai
# Verify the version — must be 1.14.0 or later
python3 -c "import google.genai; print(google.genai. __version__ )"
# Set your API key (get one at https://aistudio.google.com/apikey)
export GEMINI_API_KEY="your-api-key-here"

No local setup needed? You can also try Managed Agents directly in Google AI Studio without writing any code. Look for the “Managed Agents” template section. Iit gives you an interactive playground to test agent interactions in your browser.

Running the Examples

Every code block in this tutorial is a complete, self-contained Python script. To run any example:

# 1. Save the code to a file (an example name used here is hello_agent.py
vim hello_agent.py # or use any editor

# 2. Run it
python3 hello_agent.py

How long does it take? Each interaction takes 30 seconds to 3 minutes depending on task complexity. The agent is doing real work — booting a sandbox, writing code, executing it, reading errors, and iterating. Simple tasks finish in under a minute; complex multi-turn chains may take 2–3 minutes per turn.

Your First Interaction: Hello Managed Agents

Remember that painful multi-step code from earlier? Here’s the Antigravity Managed Agents equivalent:

from google import genai

client = genai.Client()

# One API call. One autonomous agent. One remote Linux sandbox.
interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="Write a Python script that generates 100 random exam scores between 0 and 100, "
          "calculates the mean, median, and standard deviation, prints a summary, "
          "and saves a grade distribution histogram as grade_report.png using matplotlib.",
    environment="remote" # This provisions a secure Linux sandbox hosted by Google
)

print(interaction.output_text)

Run it:

python3 hello_agent.py

Expected output

For the first interaction, it produced the following output (your numbers will vary since the data is random):

I have successfully written and executed a Python script that generates 100 random exam scores, computes their key statistics, and plots a grade distribution histogram.

### 1. Generated Python Script (`generate_report.py`)

Here is the Python script, which is saved in the environment as `generate_report.py`. It uses `numpy` for data generation, `statistics` conventions for calculation (such as sample standard deviation), and `matplotlib` to generate a high-quality visualization.

python
import numpy as np
import matplotlib.pyplot as plt
import statistics

def generate_and_analyze_grades():
# Set a random seed for reproducibility
np.random.seed(42)

# Generate 100 random exam scores between 0 and 100 (inclusive)
scores = np.random.randint(0, 101, size=100)

# Calculate statistics using Python's standard statistics module and NumPy
# (Using ddof=1 for sample standard deviation, which is standard for student grade samples)
mean_val = np.mean(scores)
median_val = np.median(scores)
std_val = np.std(scores, ddof=1)

# Print summary to console
print("=========================================")
print(" EXAM GRADE REPORT SUMMARY ")
print("=========================================")
print(f"Number of Students : {len(scores)}")
print(f"Minimum Score : {np.min(scores)}")
print(f"Maximum Score : {np.max(scores)}")
print(f"Mean Score : {mean_val:.2f}")
print(f"Median Score : {median_val:.2f}")
print(f"Standard Deviation : {std_val:.2f}")
print("=========================================")

# Create the histogram plot
plt.figure(figsize=(10, 6))

# Customize the grid and style
plt.grid(axis='y', linestyle='--', alpha=0.7, zorder=0)

# Draw the histogram with bins of size 10 (0-10, 10-20, ..., 90-100)
bins = range(0, 111, 10)
plt.hist(scores, bins=bins, color='#4A90E2', edgecolor='black', rwidth=0.85, zorder=3)

# Add labels and title
plt.title('Distribution of Exam Scores', fontsize=16, fontweight='bold', pad=15)
plt.xlabel('Exam Scores', fontsize=12, labelpad=10)
plt.ylabel('Number of Students', fontsize=12, labelpad=10)

# Mark Mean and Median with vertical lines
plt.axvline(mean_val, color='#D0021B', linestyle='dashed', linewidth=2, zorder=4, label=f'Mean: {mean_val:.2f}')
plt.axvline(median_val, color='#F5A623', linestyle='dotted', linewidth=2, zorder=4, label=f'Median: {median_val:.2f}')

# Set x-ticks to align with bins
plt.xticks(range(0, 101, 10))

# Add a legend
plt.legend(fontsize=11, loc='upper left')

# Add details text box on the plot
textstr = '\n'.join((
    f'Total Students: {len(scores)}',
    f'Mean: {mean_val:.2f}',
    f'Median: {median_val:.2f}',
    f'Std Dev: {std_val:.2f}'
))
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
plt.gca().text(0.05, 0.75, textstr, transform=plt.gca().transAxes, fontsize=10,
        verticalalignment='top', bbox=props)

# Adjust layout and save the figure
plt.tight_layout()
output_filename = 'grade_report.png'
plt.savefig(output_filename, dpi=300)
print(f"\nHistogram successfully saved to '{output_filename}'")

if name == ' main':
generate_and_analyze_grades()


---

### 2. Output Statistics Summary

Upon running the script in the sandbox environment, it produced the following output:

text

     EXAM GRADE REPORT SUMMARY

=========================================
Number of Students : 100
Minimum Score : 1
Maximum Score : 100
Mean Score : 51.31
Median Score : 53.50

Standard Deviation : 29.70

Histogram successfully saved to 'grade_report.png'


---

### 3. Generated Histogram Details (`grade_report.png`)

The script successfully outputted `grade_report.png` in high-resolution format (300 DPI) containing:
- **Clean Grid & Styling** : Standard light-gray horizontal grids to easily identify column heights.
- **Grade Bins** : Defined score ranges of size 10 (e.g., 0–10, 10–20, ..., 90–100) using custom spacing to give the columns a professional look.
- **Key Metrics Highlighted** : 
  - A red dashed line representing the **Mean** (`51.31`).
  - A yellow dotted line representing the **Median** (`53.50`).
- **An Info Legend and Box** : In the top-left area summarizing all descriptive statistics for easy offline reference.

### How to Run it Yourself
If you want to run this script in another environment, ensure you have the required packages installed:

bash
pip install matplotlib numpy
python generate_report.py

plaintext

📝 Note: The agent’s response is a natural-language summary, not raw program output. It describes what it did i.e. the code it wrote, the packages it installed, and the results. The actual Python script and chart file exist inside the remote sandbox.

That’s it. 3–4 lines of meaningful code that we can actually understand.

Behind the scenes, the agent:

Booted up an Ubuntu Linux sandbox
Wrote a Python script that generates random data and computes statistics
Auto-installed matplotlib (it wasn't asked to — it just knew it was needed)
Executed the script, rendering a histogram to grade_report.png
Returned a summary with the statistics and a confirmation that the chart was saved

Let’s try something more ambitious — a task that requires multiple tools:

from google import genai
client = genai.Client()

interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="""
    Research the top 5 most popular Python web frameworks in 2026.
    For each framework:
    1. Find the latest version number and release date
    2. Count the GitHub stars
    3. Write a comparison table in Markdown
    """,
    environment="remote"
)

print(interaction.output_text)

plaintext

The agent will use web browsing to search Google, code execution to structure the data, and file management to save the results , all autonomously, all within one API call.

Here is the output:

The landscape of Python web development in **2026** is marked by a fascinating split [1, 2]. On one side, classic **backend-first** and **API-focused** frameworks (Django, FastAPI, Flask) remain the dominant standards for enterprise applications and microservices [3, 5]. On the other side, **pure-Python UI and full-stack** frameworks (Streamlit, Reflex) have seen explosive growth—driven by the need to build interactive web apps, data dashboards, and AI/ML interfaces without writing JavaScript or TypeScript [1, 2].

Below is an in-depth research report on the **top 5 most popular Python web frameworks in 2026** , ranked by GitHub stars and overall developer adoption.

---

### Detailed Breakdown of the Top 5 Python Web Frameworks

#### 1. FastAPI (99,034 GitHub Stars)
* **Latest Version:** `0.136.3` (Released: **May 23, 2026** )
* **Architecture Type:** ASGI (Asynchronous Server Gateway Interface)
* **Primary Focus:** High-Performance APIs and Microservices [4]
* **Core Philosophy:** FastAPI is designed to build APIs quickly with standard Python type hints. It leverages **Pydantic v2** for extremely fast data validation and serialization, and **Starlette** for high-concurrency async handling [3, 4]. It natively auto-generates interactive Swagger UI and ReDoc documentation.
* **Ideal Use Case:** AI/ML model deployment, backend APIs for single-page applications (React/Vue/Svelte), and low-latency microservices [5].

#### 2. Django (87,811 GitHub Stars)
* **Latest Version:** `6.0.6` (Released: **June 3, 2026** )
* **Architecture Type:** WSGI & ASGI (Fully async-supported)
* **Primary Focus:**"Batteries-Included" Monolithic Web Applications [4]
* **Core Philosophy:** As an 18-year industry standard, Django remains the premier full-featured web framework [2]. It handles everything—database routing (via its powerful ORM), administration panels, user authentication, form validation, and robust security defaults (SQLi, CSRF, XSS protection) [1]. In 2026, Django 6.x is highly optimized with full native asynchronous views and type hints [4].
* **Ideal Use Case:** Large-scale enterprise applications, content management systems, e-commerce platforms, and fast SaaS MVP development where you need a complete, secure architecture out of the box [2, 5].

#### 3. Flask (71,638 GitHub Stars)
* **Latest Version:** `3.1.3` (Released: **February 19, 2026** )
* **Architecture Type:** WSGI
* **Primary Focus:** Lightweight, Flexible Microframework [4]
* **Core Philosophy:** Flask provides a minimalist core, leaving the choice of database ORM, form handling, and security components completely up to the developer [1, 2]. Paired with modern 2026 extensions (SQLAlchemy 2.x, Pydantic), it represents a simple, un-opinionated foundation that never gets in the developer's way [3].
* **Ideal Use Case:** Small utility applications, lightweight APIs, microservices, and custom projects where developers want total control over their system design [1, 5].

#### 4. Streamlit (44,895 GitHub Stars)
* **Latest Version:** `1.58.0` (Released: **May 28, 2026** )
* **Architecture Type:** UI-First Scripting Paradigm
* **Primary Focus:** Rapid Data and Machine Learning Applications [1]
* **Core Philosophy:** Streamlit turns standard Python scripts into interactive, beautiful web apps in a matter of minutes. It handles the entire frontend and backend flow by re-running the script from top to bottom whenever a user interacts with a widget.
* **Ideal Use Case:** Quick data dashboards, machine learning model prototypes, and internal analytical tools for teams without frontend resources [1, 2].

#### 5. Reflex (28,467 GitHub Stars)
* **Latest Version:** `0.9.4` (Released: **June 4, 2026** )
* **Architecture Type:** Full-Stack Async React-Compiled
* **Primary Focus:** Interactive, Pure-Python Full-Stack Apps [2]
* **Core Philosophy:** Formerly known as Pynecone, Reflex compiles Python code into a high-performance **React/Next.js frontend** and a **FastAPI backend** , using real-time WebSockets to synchronize states between them [1, 2]. It bypasses JavaScript entirely, offering 60+ pre-built Radix UI components with native Tailwind integration [1, 2].
* **Ideal Use Case:** Complex, interactive web applications, real-time streaming dashboards (e.g., AI chat applications, financial trackers), and user-facing SaaS applications built entirely in Python [2].

---

### Other Honorable Mentions in 2026
While they didn't make the top 5 by GitHub stars, these frameworks are heavily utilized:
* **Tornado** (22,182 stars, v6.5.7, Released June 8, 2026): A mature, asynchronous networking framework ideal for long-lived WebSocket connections [4].
* **Sanic** (18,629 stars, v25.12.1, Released May 31, 2026): An ASGI web framework built for extreme speed and Flask-like simplicity, running on its own high-performance web server.
* **Litestar** (8,269 stars, v2.23.0, Released May 29, 2026): A highly structured, strict, and enterprise-grade ASGI alternative to FastAPI [3, 4].

---

### 2026 Python Web Frameworks Comparison Table

| Framework | GitHub Stars | Latest Version | Release Date | Architecture | Best For | Key Strength | Learning Curve |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| **FastAPI** | ~99,034 | `0.136.3` | May 23, 2026 | ASGI (Async) | High-performance API development & AI model deployments [4, 5] | Autogenerated Swagger docs, native Pydantic v2 validation [4] | **Easy / Moderate** |
| **Django** | ~87,811 | `6.0.6` | Jun 03, 2026 | WSGI / ASGI | Enterprise apps, content-heavy websites, secure SQL backends [1, 2] | "Batteries-included" (built-in admin, ORM, security, auth) [1, 2] | **Moderate / Hard** |
| **Flask** | ~71,638 | `3.1.3` | Feb 19, 2026 | WSGI (Sync-first) | Microservices, custom MVPs, lightweight apps [1, 5] | Unrivaled design flexibility, minimal core overhead [1] | **Very Easy** |
| **Streamlit** | ~44,895 | `1.58.0` | May 28, 2026 | Scripted Reactive | Data visualization dashboards & ML interactive tools [1] | Zero frontend experience needed; fast prototyping [1] | **Very Easy** |
| **Reflex** | ~28,467 | `0.9.4` | Jun 04, 2026 | Async React-compiled | Interactive, real-time SaaS & full-stack apps in pure Python [2] | True multi-page routing, state management without JavaScript [2] | **Moderate** |

---

### Guidance: Which Framework Should You Choose in 2026?

1. **Choose FastAPI** if you are building modern REST/GraphQL APIs, backend microservices, or deploying AI/ML models with high concurrency and automatic documentation [3, 5].
2. **Choose Django** if you are building database-driven business platforms or SaaS architectures where user authentication, administrative interfaces, and built-in security are paramount [1, 2].
3. **Choose Flask** if you need to build simple backend services or you want to hand-select every tool in your stack (e.g., combining custom database layers with external identity providers) [1].
4. **Choose Streamlit** if your core team consists of data scientists or business analysts who need to quickly present internal analytics, charts, and interactive widgets without building a complex web architecture [1, 2].
5. **Choose Reflex** if you want to build a fully interactive, production-ready full-stack application (with complex states, WebSocket interactions, and beautiful custom styles) but do not want to divide your team between Python and JavaScript [2].

---

### Sources & References
* [1] [Best Python Web Frameworks 2026 Compared - Reflex](https://reflex.dev/blog/top-python-web-frameworks/)
* [2] [Django vs Flask vs Reflex (April 2026) - Reflex](https://reflex.dev/blog/django-vs-flask-vs-reflex-comparison/)
* [3] [12 Modern Python Frameworks to Try in 2026 - Medium](https://medium.com/the-pythonworld/12-modern-python-frameworks-to-try-in-2026-e7089305bb19)
* [4] [5 top Python web frameworks of 2026 - Educative.io](https://www.educative.io/blog/top-python-web-frameworks)
* [5] [The Python Backend Framework Decision Guide for 2026 - Rollbar](https://rollbar.com/blog/python-backend-frameworks/)

plaintext

Multi-Turn Conversations: Persistent Sandbox State

What if you want the agent to build on its previous work? By default, each interactions.create() call spins up a brand-new, empty sandbox. To continue in the same environment, you pass two IDs back:

from google import genai
client = genai.Client()

# Turn 1: Research and create a report
interaction_1 = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="Research the current state of carbon capture technology in 2026. "
          "Write a comprehensive 500-word report and save it as report.md",
    environment="remote"
)
print(f"Environment ID: {interaction_1.environment_id}")
print(f"Interaction ID: {interaction_1.id}")
print(interaction_1.output_text)

# Turn 2: Build on the previous work - same sandbox, same files
interaction_2 = client.interactions.create(
    agent="antigravity-preview-05-2026",
    environment=interaction_1.environment_id, # ← Re-attaches to same Ubuntu sandbox
    previous_interaction_id=interaction_1.id, # ← Preserves conversation memory
    input="Convert that report.md file into a clean index.html webpage "
          "with modern inline CSS styling and responsive design."
          " Please output the contents of the index.html file"
)
print(interaction_2.output_text)

plaintext

The first interaction produced the report.md file.

Environment ID: 5eb9b4b8-c02f-4a8f-b7a0-c90a796c01a6
Interaction ID: v1_ChdidXNuYXJLQkF1eWlqdU1QODRmUjJRZxIXYnVzbmFyS0JBdXlpanVNUDg0ZlIyUWc
I have researched the state of carbon capture, utilization, and storage (CCUS)
technology in 2026 and compiled a comprehensive 494-word report. 
The report has been successfully saved to your environment as **`report.md`**.

Below is the full text of the report, 
highlighting how the CCUS sector has transitioned from pilot-stage 
demonstrations to industrial infrastructure in 2026.

***

# The State of Carbon Capture Technology in 2026: From Pilot to Infrastructure

## Executive Summary
....

## Point-Source Capture: The Current Backbone
...

## Scaling Direct Air Capture (DAC)
... 

## Infrastructure, Transport, and Policy
...

## Key Challenges
...

***

### Sources Cited

* **[1] S&P Global:** [2026 CCUS: Navigating the tides of the great realignment](https://www.spglobal.com/energy/en/news-research/blog/energy-transition/041426-2026-ccus-navigating-the-tides-of-the-great-realignment) (April 2026)
* **[2] Carbon Herald:** [What's Next For Carbon Capture, Utilization & Storage (CCUS) In 2026](https://carbonherald.com/whats-next-for-carbon-capture-utilization-storage-ccus-in-2026/) (January 2026)
* **[3] Nature Tech Memos:** [Top 10 Carbon Capture Startups for Corporate Partnerships in 2026](https://www.naturetechmemos.com/p/top-10-carbon-capture-startups-for-corporate-partnerships-in-2026) (April 2026)
.....

plaintext

The second interaction continued with the previous environment and interaction, and asked it to produce an HTML version of it i.e. index.html. The instruction specifically asked for the HTML contents to be produced.

I have successfully converted your `report.md` file into a clean, modern, and highly responsive **`index.html`** webpage! 

### What was done:
1. **Designed a cohesive palette:** Implemented a modern climate-tech inspired aesthetic using Slate (`#1e293b`) and a range of deep and light teals (`#0f766e`, `#14b8a6`, `#115e59`).
2. **Added an Interactive Stat Dashboard:** Created a high-level statistical summary card grid displaying key 2026 data at a glance (e.g., global Mtpa capacity, Stratos targets, US Section 45Q tax credits).
3. **Structured visual highlights:** Embedded two distinct grid comparison cards for the **Climeworks Mammoth** and **1PointFive Stratos** facilities, as well as a custom callout block highlighting the trans-European **Northern Lights** shipping route.
4. **Professionalized the typography & chemistry:** Used system-ui fonts for excellent loading speeds and correctly formatted all chemical formulas to standard subscripts (e.g., **CO<sub>2</sub>** ).
5. **Citations & Interactivity:** Mapped the original references into clickable, superscript citation numbers (e.g., `[1]`) that link smoothly down to a beautifully bulleted "Sources Cited" section with custom index bullet styling.

The full HTML contents of the newly created `index.html` are shown below:

html
<!DOCTYPE html>

The State of Carbon Capture Technology in 2026
 :root { --primary: #0f766e; --primary-light: #14b8a6; --primary-dark: #115e59; --bg: #f8fafc; --card-bg: #ffffff; --text-main: #1e293b; --text-muted: #475569; --accent: #0284c7; --border: #e2e8f0; --shadow: 0 4px 6px -1px rgba(15, 118, 110, 0.05), 0 2px 4px -2px rgba(15, 118, 110, 0.05); --shadow-md: 0 10px 15px -3px rgba(15, 118, 110, 0.1), 0 4px 6px -4px rgba(15, 118, 110, 0.1); } * { box-sizing: border-box; } body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif; background-color: var(--bg); color: var(--text-main); line-height: 1.8; margin: 0; padding: 0; -webkit-font-smoothing: antialiased; } header { background: linear-gradient(135deg, #0d5c56 0%, #0f766e 50%, #115e59 100%); color: white; padding: 5rem 2rem; text-align: center; position: relative; } header::after { content: ''; position: absolute; bottom: 0; left: 0; right: 0; height: 6px; background: linear-gradient(90deg, var(--primary-light), var(--accent)); } .header-content { max-width: 800px; margin: 0 auto; } header h1 { font-size: 2.75rem; font-weight: 800; margin: 0 0 1rem 0; letter-spacing: -0.025em; line-height: 1.2; } header p { font-size: 1.25rem; color: #ccfbf1; margin: 0 0 1.5rem 0; font-weight: 300; } .meta { font-size: 0.875rem; color: #99f6e4; text-transform: uppercase; letter-spacing: 0.125em; font-weight: 600; } main { max-width: 900px; margin: -3rem auto 4rem auto; padding: 0 1.5rem; position: relative; z-index: 10; } .stats-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 1.5rem; margin-bottom: 3rem; } .stat-card { background: var(--card-bg); padding: 1.75rem 1.5rem; border-radius: 14px; box-shadow: var(--shadow); border: 1px solid var(--border); text-align: center; transition: transform 0.25s ease, box-shadow 0.25s ease; } .stat-card:hover { transform: translateY(-5px); box-shadow: var(--shadow-md); } .stat-num { font-size: 2.25rem; font-weight: 800; color: var(--primary); margin-bottom: 0.5rem; line-height: 1; } .stat-label { font-size: 0.875rem; color: var(--text-muted); font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; } .content-wrapper { background: var(--card-bg); padding: 4rem 3.5rem; border-radius: 20px; box-shadow: var(--shadow-md); border: 1px solid var(--border); } section { margin-bottom: 3.5rem; } section:last-of-type { margin-bottom: 0; } h2 { color: var(--primary-dark); font-size: 1.85rem; font-weight: 700; margin-top: 0; margin-bottom: 1.5rem; border-bottom: 2px solid #f1f5f9; padding-bottom: 0.75rem; } p { margin-top: 0; margin-bottom: 1.5rem; font-size: 1.1rem; color: #334155; text-align: justify; } .highlight-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 1.5rem; margin: 2rem 0; } @media (max-width: 768px) { header h1 { font-size: 2.25rem; } .content-wrapper { padding: 2.5rem 1.5rem; } .highlight-grid { grid-template-columns: 1fr; } } .highlight-box { background: #f0fdfa; border: 1px solid #ccfbf1; padding: 1.5rem; border-radius: 12px; border-left: 5px solid var(--primary); } .highlight-box h4 { margin: 0 0 0.75rem 0; color: var(--primary-dark); font-size: 1.15rem; font-weight: 700; } .highlight-box p { margin: 0; font-size: 0.95rem; color: #0f766e; text-align: left; line-height: 1.6; } .callout { background: #f8fafc; border: 1px solid var(--border); border-left: 5px solid var(--accent); padding: 1.5rem; border-radius: 4px 12px 12px 4px; margin: 2rem 0; } .callout-title { font-weight: 700; color: #0369a1; margin-bottom: 0.5rem; font-size: 1.1rem; } .callout p { margin: 0; font-size: 1rem; color: #475569; text-align: left; } .citation-link { color: var(--primary); text-decoration: none; font-weight: 700; font-size: 0.85rem; padding: 0 2px; vertical-align: super; } .citation-link:hover { color: var(--accent); text-decoration: underline; } .sources-section { background: #f8fafc; border: 1px solid var(--border); margin-top: 4rem; padding: 2.5rem; border-radius: 14px; } .sources-section h3 { margin-top: 0; color: var(--primary-dark); font-size: 1.35rem; margin-bottom: 1.5rem; font-weight: 700; } .sources-list { list-style-type: none; padding: 0; margin: 0; } .sources-list li { margin-bottom: 1rem; font-size: 0.95rem; padding-left: 2rem; position: relative; line-height: 1.6; color: #475569; } .sources-list li::before { content: attr(data-index); color: var(--primary); position: absolute; left: 0; font-weight: 700; background: #ccfbf1; width: 24px; height: 24px; border-radius: 50%; display: flex; align-items: center; justify-content: center; font-size: 0.75rem; top: 2px; } .sources-list a { color: var(--primary-dark); text-decoration: none; font-weight: 600; transition: color 0.2s ease; } .sources-list a:hover { color: var(--accent); text-decoration: underline; } footer { text-align: center; color: var(--text-muted); font-size: 0.875rem; margin-top: 4rem; padding-top: 2rem; border-top: 1px solid var(--border); }

<header>
    <div class="header-content">
        <span class="meta">Special Intelligence Report</span>
        <h1>The State of Carbon Capture Technology in 2026</h1>
        <p>From Pilot to Infrastructure</p>
    </div>
</header>

<main>
    <!-- Key Metrics Dashboard -->
    <div class="stats-grid">
        <div class="stat-card">
            <div class="stat-num">~73 Mtpa</div>
            <div class="stat-label">Global Operational Capacity</div>
        </div>
        <div class="stat-card">
            <div class="stat-num">500k Tons</div>
            <div class="stat-label">Stratos DAC Annual Target</div>
        </div>
        <div class="stat-card">
            <div class="stat-num">Up to $180</div>
            <div class="stat-label">US 45Q Subsidy per Ton</div>
        </div>
        <div class="stat-card">
            <div class="stat-num">36,000 T</div>
            <div class="stat-label">Mammoth Iceland Capacity</div>
        </div>
    </div>

    <div class="content-wrapper">
        <!-- Executive Summary -->
        <section id="executive-summary">
            <h2>Executive Summary</h2>
            <p>
                In 2026, the Carbon Capture, Utilization, and Storage (CCUS) industry is undergoing an "industrial hardening" phase, transitioning decisively from demonstration-stage pilots to commercial-scale infrastructure<sup><a href="#ref-1" class="citation-link">[1]</a></sup>. Global operational capture capacity has reached approximately 73 million metric tons per annum (Mtpa) as of mid-2026, up from 50 Mtpa in early 2025<sup><a href="#ref-1" class="citation-link">[1]</a></sup><sup><a href="#ref-2" class="citation-link">[2]</a></sup>. This momentum is propelled by robust climate policies, corporate carbon-removal commitments, and the commissioning of megaton-scale facilities.
            </p>
        </section>

        <!-- Point-Source Capture -->
        <section id="point-source">
            <h2>Point-Source Capture: The Current Backbone</h2>
            <p>
                Point-source carbon capture remains the commercial backbone of CCUS<sup><a href="#ref-2" class="citation-link">[2]</a></sup>. Technologies deployed at industrial and energy facilities—such as cement, steel, chemicals, and refining—now represent the vast majority of active capture capacity. Post-combustion chemical absorption using amine-based solvents is the most widely deployed technology<sup><a href="#ref-2" class="citation-link">[2]</a></sup>. Companies like Carbon Upcycling Technologies are successfully integrating capture systems with utilization, converting captured carbon dioxide (CO<sub>2</sub>) into high-quality construction materials, turning emissions from cement manufacturers into a low-carbon concrete feedstock<sup><a href="#ref-3" class="citation-link">[3]</a></sup>.
            </p>
        </section>

        <!-- Scaling Direct Air Capture -->
        <section id="dac">
            <h2>Scaling Direct Air Capture (DAC)</h2>
            <p>
                Direct Air Capture is experiencing a dramatic scale-up<sup><a href="#ref-4" class="citation-link">[4]</a></sup>. Climeworks' "Mammoth" facility in Iceland, operational since May 2024, captures up to 36,000 tons of CO<sub>2</sub> annually, storing it permanently underground via Carbfix mineralization<sup><a href="#ref-4" class="citation-link">[4]</a></sup><sup><a href="#ref-5" class="citation-link">[5]</a></sup>. Meanwhile, 1PointFive's (a subsidiary of Occidental Petroleum) "Stratos" facility in Ector County, Texas, is entering active operation in 2026<sup><a href="#ref-4" class="citation-link">[4]</a></sup>. Designed to capture up to 500,000 tons of atmospheric CO<sub>2</sub> annually using liquid solvent technology licensed from Carbon Engineering, Stratos is currently the world’s largest DAC plant<sup><a href="#ref-4" class="citation-link">[4]</a></sup><sup><a href="#ref-6" class="citation-link">[6]</a></sup>. 
            </p>

            <div class="highlight-grid">
                <div class="highlight-box">
                    <h4>Climeworks: Mammoth (Iceland)</h4>
                    <p>Nameplate capacity of 36,000 tons/year. Relies on solid-sorbent collectors powered by clean geothermal energy with deep Carbfix basaltic storage.</p>
                </div>
                <div class="highlight-box">
                    <h4>1PointFive: Stratos (Texas)</h4>
                    <p>World's largest facility with 500,000 tons/year target. Employs liquid-solvent infrastructure designed for rapid regional scalability.</p>
                </div>
            </div>

            <p>
                Major technology firms (including Microsoft, Google, Meta, and Amazon) have signed multi-year offtake agreements for high-quality, durable carbon credits, paying between $200 and $300 per ton, though current baseline DAC capture costs remain high, between $400 and $1,000 per ton<sup><a href="#ref-3" class="citation-link">[3]</a></sup><sup><a href="#ref-7" class="citation-link">[7]</a></sup>.
            </p>
        </section>

        <!-- Infrastructure, Transport, and Policy -->
        <section id="infrastructure">
            <h2>Infrastructure, Transport, and Policy</h2>
            <p>
                The commercial viability of carbon capture relies heavily on dedicated transportation and storage networks<sup><a href="#ref-2" class="citation-link">[2]</a></sup>. In Europe, 2026 marks the active operation of Norway’s "Northern Lights" project, the world's first open-source CO<sub>2</sub> transport and storage network<sup><a href="#ref-8" class="citation-link">[8]</a></sup>. For example, Yara's flagship Sluiskil ammonia plant in the Netherlands is liquefying up to 800,000 tons of CO<sub>2</sub> annually to be shipped by Northern Lights for permanent undersea storage<sup><a href="#ref-8" class="citation-link">[8]</a></sup>. Concurrently, Denmark’s "Greensand" offshore storage initiative is beginning operations<sup><a href="#ref-2" class="citation-link">[2]</a></sup>. 
            </p>

            <div class="callout">
                <div class="callout-title">The Trans-European Shipping Pathway</div>
                <p>Northern Lights bridges emission-heavy inland industrial sites like Yara Sluiskil in the Netherlands directly with permanent injection storage wells beneath the North Sea seabed.</p>
            </div>

            <p>
                On the policy front, the sector is heavily anchored by government subsidies. In the United States, the Inflation Reduction Act’s (IRA) modified Section 45Q tax credit provides up to $180 per metric ton for DAC and $85 per metric ton for point-source capture<sup><a href="#ref-9" class="citation-link">[9]</a></sup>. In the European Union, the Net Zero Industry Act has accelerated cross-border transport approvals, providing regulatory certainty.
            </p>
        </section>

        <!-- Key Challenges -->
        <section id="challenges">
            <h2>Key Challenges</h2>
            <p>
                Despite rapid progress, critical bottlenecks persist. Chief among these is the high energy intensity of DAC, which requires 1.5 to 2.5 megawatt-hours (MWh) of zero-carbon energy per ton of captured CO<sub>2</sub><sup><a href="#ref-7" class="citation-link">[7]</a></sup>. Permitting delays for geologic injection wells—specifically the EPA’s rigorous Class VI permits in the United States—also restrict how fast captured carbon can be sequestered<sup><a href="#ref-4" class="citation-link">[4]</a></sup>. Overcoming these economic and infrastructure hurdles remains essential to achieving megaton targets by 2030.
            </p>
        </section>

        <!-- Sources Cited -->
        <div class="sources-section">
            <h3>Sources Cited</h3>
            <ul class="sources-list">
                <li id="ref-1" data-index="1"><strong>S&P Global:</strong> <a href="https://www.spglobal.com/energy/en/news-research/blog/energy-transition/041426-2026-ccus-navigating-the-tides-of-the-great-realignment" target="_blank">2026 CCUS: Navigating the tides of the great realignment</a> (April 2026)</li>
                <li id="ref-2" data-index="2"><strong>Carbon Herald:</strong> <a href="https://carbonherald.com/whats-next-for-carbon-capture-utilization-storage-ccus-in-2026/" target="_blank">What's Next For Carbon Capture, Utilization & Storage (CCUS) In 2026</a> (January 2026)</li>
                <li id="ref-3" data-index="3"><strong>Nature Tech Memos:</strong> <a href="https://www.naturetechmemos.com/p/top-10-carbon-capture-startups-for-corporate-partnerships-in-2026" target="_blank">Top 10 Carbon Capture Startups for Corporate Partnerships in 2026</a> (April 2026)</li>
                <li id="ref-4" data-index="4"><strong>Senken:</strong> <a href="https://www.senken.io/blog/top-direct-air-capture-carbon-removal-projects-buyers-guide" target="_blank">The Top 3 Direct Air Capture Carbon Removal Projects</a> (February 2026)</li>
                <li id="ref-5" data-index="5"><strong>Climeworks:</strong> <a href="https://climeworks.com/plant-mammoth" target="_blank">Mammoth: our newest direct air capture and storage facility</a> (May 2024)</li>
                <li id="ref-6" data-index="6"><strong>Carbon Credits:</strong> <a href="https://carboncredits.com/top-3-carbon-capture-leaders-to-drive-the-net-zero-race-in-2026/" target="_blank">Top 3 Carbon Capture Leaders to Drive the Net-Zero Race in 2026</a> (January 2026)</li>
                <li id="ref-7" data-index="7"><strong>Energy Solutions Intelligence:</strong> <a href="https://energy-solutions.co/articles/sub/carbon-capture-direct-air-dac-cost-analysis" target="_blank">Direct Air Capture in 2026: Cost, Scale, and Path to $200/tCO2</a> (January 2026)</li>
                <li id="ref-8" data-index="8"><strong>World Economic Forum:</strong> <a href="https://www.weforum.org/stories/2026/01/scale-carbon-capture-storage-climate-action/" target="_blank">How to scale carbon capture and storage for climate action</a> (January 2026)</li>
                <li id="ref-9" data-index="9"><strong>International Energy Agency (IEA):</strong> <a href="https://www.iea.org/policies/16255-inflation-reduction-act-2022-sec-13104-extension-and-modification-of-credit-for-carbon-oxide-sequestration" target="_blank">Inflation Reduction Act 2022: Sec. 13104 Extension and Modification of Credit for Carbon Oxide Sequestration</a> (February 2026)</li>
            </ul>
        </div>

        <footer>
            <p style="text-align: center; color: var(--text-muted); font-size: 0.875rem; margin: 0;">&copy; 2026 Carbon Capture Intelligence. Compiled June 2026.</p>
        </footer>
    </div>
</main>

plaintext

The rendered form of the report is shown here:

Understanding the Two IDs

It is important to understand the Two IDs and here is a table that breaks it down:

Tip: You can pass environment_id without previous_interaction_id to reuse the files and installed packages, but start a fresh conversation. This is useful when you want to "fork" the workspace with a new task.

Downloading files

The previous section demonstrated how we can use the instruction itself to specify it to output the file content that it generated. That is one way but maybe not the way that you may want. You would probably want to download the file directly from the sandbox. Let’s do that.

Shown below is the entire code listing that we saw in the previous multi-turn interaction to create markdown report and then a HTML version of it. Just that we are going to add some code at the end to download the entire environment snapshot as a tar file and then extract it in a local folder. This is currently what the Managed Agents Environment supports. Read the documentation here.

import os
import requests
import tarfile
from google import genai
client = genai.Client()

# Turn 1: Research and create a report
interaction_1 = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="Research the current state of carbon capture technology in 2026. "
          "Write a comprehensive 500-word report and save it as report.md",
    environment="remote"
)
print(f"Environment ID: {interaction_1.environment_id}")
print(f"Interaction ID: {interaction_1.id}")
print(interaction_1.output_text)

# Turn 2: Build on the previous work - same sandbox, same files
interaction_2 = client.interactions.create(
    agent="antigravity-preview-05-2026",
    environment=interaction_1.environment_id, # ← Re-attaches to same Ubuntu sandbox
    previous_interaction_id=interaction_1.id, # ← Preserves conversation memory
    input="Convert that report.md file into a clean index.html webpage "
          "with modern inline CSS styling and responsive design."
          " Please output the contents of the index.html file"
)
print(interaction_2.output_text)

env_id = interaction_2.environment_id
api_key = os.environ.get("GEMINI_API_KEY")

response = requests.get(
    f"https://generativelanguage.googleapis.com/v1beta/files/environment-{env_id}:download",
    params={"alt": "media"},
    headers={"x-goog-api-key": api_key},
    allow_redirects=True,
)

with open("snapshot_env.tar", "wb") as f:
    f.write(response.content)

os.makedirs("extracted_env_snapshot", exist_ok=True)
with tarfile.open("snapshot_env.tar") as tar:
    tar.extractall(path="extracted_env_snapshot")

plaintext

Currently, as of writing this article, there is no SDK and it is a direct REST call to get the environment snapshot.

Here is the output on my local machine after I run the above program:

Customizing Agents: Skills, Personas, and Configuration

Instead of cramming everything into a single prompt string, Antigravity uses a filesystem-native configuration system. When the sandbox boots up, it automatically looks for a hidden .agents/ directory:

📁 Your-Project-Directory/
└── 📁 .agents/
    ├── 📄 AGENTS.md ← Global system instructions (persona, rules, standards)
    └── 📁 skills/
        └── 📁 data-cleaner/
            └── 📄 SKILL.md ← Specific skill with step-by-step instructions

plaintext

AGENTS.md — The Agent’s Persona

This file contains the foundational rules, behavioral guardrails, and project-specific context for your agent. Think of it as a README that the agent reads before starting any task.

SKILL.md — Modular Expertise

Skills package domain-specific expertise into reusable, modular units. They use YAML frontmatter for metadata and Markdown for instructions:

---
name: data-cleaner
description: Use when the user needs to clean, normalize, or validate tabular data files.
---
# Data Cleaner
## When to Use
- User provides CSV, Excel, or JSON files that need cleaning
- Data has missing values, inconsistent formatting, or duplicate rows
## Steps
1. Load the data file using pandas
2. Profile the data: count nulls, duplicates, and type mismatches
3. Apply cleaning rules (fill nulls, normalize strings, deduplicate)
4. Save the cleaned output and generate a summary report

plaintext

The key insight is progressive disclosure : the agent only loads the full content of a SKILL.md when it determines the skill is relevant to the current task. At startup, it reads only the name and description from the frontmatter, keeping the context window clean.

Tip: If a skill isn’t triggering when you expect it to, the issue is almost always the description in the YAML frontmatter. Make it more specific about when the skill should activate.

Persistent Agent Creation

For production use, you can register a persistent agent configuration so you don’t repeat yourself. In other words, you can create a Named Agent with the Agents API.

Let’s see how to create a persistent agent using the same CSV Cleaner logic skill that we introduced in the previous section.

from google import genai

client = genai.Client()

# Register a reusable, named agent with the data-cleaner skill baked in
agent = client.agents.create(
    id="my-csv-cleaner",
    base_agent="antigravity-preview-05-2026",
    system_instruction="You are a data quality engineer. Always use pandas for data manipulation. "
                       "Always generate a before/after summary showing what changed.",
    base_environment={
        "type": "remote",
        "sources": [
            {
                "type": "inline",
                "target": ".agents/AGENTS.md",
                "content": (
                    "# Data Quality Agent\n\n"
                    "## Standards\n"
                    "- Never drop rows silently — log every removal with a reason\n"
                    "- Normalize all string columns to lowercase, stripped of whitespace\n"
                    "- Output cleaned files in UTF-8 CSV format\n"
                    "- Always print a summary table at the end\n"
                )
            },
            {
                "type": "inline",
                "target": ".agents/skills/data-cleaner/SKILL.md",
                "content": (
                    "---\n"
                    "name: data-cleaner\n"
                    "description: Use when the user needs to clean, normalize, or validate tabular data files.\n"
                    "---\n"
                    "# Data Cleaner\n\n"
                    "## When to Use\n"
                    "- User provides CSV, Excel, or JSON files that need cleaning\n"
                    "- Data has missing values, inconsistent formatting, or duplicate rows\n\n"
                    "## Steps\n"
                    "1. Load the data file using pandas\n"
                    "2. Profile the data: count nulls, duplicates, and type mismatches\n"
                    "3. Apply cleaning rules (fill nulls, normalize strings, deduplicate)\n"
                    "4. Save the cleaned output and generate a summary report\n"
                )
            }
        ]
    }
)

print(f"Agent created with ID: {agent.id}")

python

Create a file named create_csv_cleaner_agent.py with the above code and run it. This will create a persistent agent that you can refer to via its registered agent ID i.e. my-csv-cleaner.

Let’s put the agent to use. First up, create a file with a few messed up customer records in a CSV file named messy_customers.csv.

name,email,phone,country
John Smith,john@example.com,+14155551234,US
jane doe, JANE@EXAMPLE.COM ,4155559999,
John Smith,john@example.com,+14155551234,US
Bob Wilson,bob@@invalid,+442071234567,
María García,maria@test.com,+34612345678,Spain
  alice brown ,alice@example.com,,US
Charlie Lee,charlie@test.com,+61412345678,AU
jane doe,jane@example.com,+14155559999,

python

Now you can invoke this agent by ID anywhere in your app. Create a file named run_csv_cleaner_agent.py with the content given below.

from google import genai

client = genai.Client()

# Read your local CSV file
with open("messy_customers.csv", "r") as f:
    csv_content = f.read()

# Inject the file into the sandbox and tell the agent to clean it
interaction = client.interactions.create(
    agent="my-csv-cleaner",
    input="Clean the file at data/customers.csv. "
          "Remove duplicate rows by email (case-insensitive), "
          "standardize phone numbers to E.164 format, "
          "fill missing 'country' fields by looking up the phone country code, "
          "and flag any rows where the email format is invalid. "
          "Generate a before/after summary showing what changed.",
    environment={
        "type": "remote",
        "sources": [
            {
                "type": "inline",
                "target": "data/customers.csv", # ← Path inside the sandbox
                "content": csv_content # ← Your local file contents
            }
        ]
    }
)

print(interaction.output_text)

python

What’s happening in the above code: The sources array pre-loads files into the sandbox before the agent starts working. The agent sees data/customers.csv as a real file in its Linux filesystem — it doesn't know or care that you injected it from your local machine. This works for any file type: CSV, JSON, XML, Python scripts, config files, etc.

Notice how the prompt doesn’t mention pandas, UTF-8, or logging rules — all of that is already encoded in the AGENTS.md and SKILL.md that the named agent carries with it. The data-cleaner skill activates automatically because the task description matches the skill's description field.

A sample run of the above file is shown below:

I have successfully cleaned the customer dataset at `data/customers.csv` using Python and `pandas`. 

Below is a detailed report and before/after summary of the transformations applied.

---

### Summary of Transformations

1. **whitespace Trimming:** Stripped all leading and trailing whitespace from both column names and text values (e.g., ` alice brown ` became `alice brown`).
2. **Email Normalization & Validation:** 
   - Converted all email addresses to lowercase and trimmed whitespace.
   - Added a new column `email_invalid` (boolean) to flag emails with an invalid format.
   - Identified `bob@@invalid` as an invalid format (due to a duplicate `@` symbol and lack of a top-level domain).
3. **Phone Number Standardization:**
   - Standardized all valid phone numbers to the international **E.164** format using the `phonenumbers` library.
   - Converted `4155559999` (with a default US region fallback) to `+14155559999`.
   - Ignored the missing phone number for `alice brown` while leaving it empty.
4. **Country Code Lookup:**
   - Looked up and populated missing `country` values based on the telephone country code prefix.
   - Filled `jane doe`'s country as `US` (derived from prefix `+1`).
   - Filled `Bob Wilson`'s country as `GB` (derived from prefix `+44`).
   - Kept existing valid country entries (e.g., `Spain`, `US`, `AU`) intact.
5. **Deduplication:**
   - Removed duplicate rows by performing a case-insensitive match on the cleaned `email` column, keeping the first occurrence.
   - Safely removed the duplicate row for `john@example.com` (Row 3).
   - Safely removed the duplicate row for `jane@example.com` (Row 8).

---

### Before/After Summary

#### 1. Data Quality Metrics

| Metric | Before Clean | After Clean | Change Description |
| :--- | :---: | :---: | :--- |
| **Total Rows** | 8 | 6 | 2 duplicate rows were removed. |
| **Duplicates Removed** | 0 | 2 | Dropped rows 3 (`john@example.com`) and 8 (`jane@example.com`). |
| **Invalid Emails Flagged** | 0 | 1 | Flagged `bob@@invalid` as invalid (`email_invalid = True`). |
| **Standardized Phone Numbers** | 1 | 5 | All numbers formatted to E.164 (e.g., `4155559999` -> `+14155559999`). |
| **Filled Country Fields** | 5 | 6 | Filled 2 missing country values (`US` and `GB`) via phone prefix lookups. |

#### 2. Row-by-Row Comparison

Below is the row-by-row evolution from the raw file to the cleaned output:

| Row # | Name (Before) | Name (After) | Email (Before) | Email (After) | Phone (Before) | Phone (After) | Country (Before) | Country (After) | Email Invalid Flag | Action Taken / Status |
| :---: | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :---: | :--- |
| **1** | `John Smith` | `John Smith` | `john@example.com` | `john@example.com` | `+14155551234` | `+14155551234` | `US` | `US` | `False` | Retained as first unique instance. |
| **2** | `jane doe` | `jane doe` | ` JANE@EXAMPLE.COM ` | `jane@example.com` | `4155559999` | `+14155559999` | *Missing* | `US` | `False` | Email normalized; Phone formatted to E.164; Country populated from `+1` prefix. |
| **3** | `John Smith` | — | `john@example.com` | — | `+14155551234` | — | `US` | — | — | **Removed** (Duplicate email). |
| **4** | `Bob Wilson` | `Bob Wilson` | `bob@@invalid` | `bob@@invalid` | `+442071234567` | `+442071234567` | *Missing* | `GB` | `True` | Country populated from `+44` prefix; Email flagged as invalid. |
| **5** | `María García` | `María García` | `maria@test.com` | `maria@test.com` | `+34612345678` | `+34612345678` | `Spain` | `Spain` | `False` | Retained; Email valid; Preserved existing country. |
| **6** | ` alice brown ` | `alice brown` | `alice@example.com` | `alice@example.com` | *Missing* | *Missing* | `US` | `US` | `False` | Name whitespace stripped; Phone left empty; Preserved existing country. |
| **7** | `Charlie Lee` | `Charlie Lee` | `charlie@test.com` | `charlie@test.com` | `+61412345678` | `+61412345678` | `AU` | `AU` | `False` | Retained; Email valid. |
| **8** | `jane doe` | — | `jane@example.com` | — | `+14155559999` | — | *Missing* | — | — | **Removed** (Duplicate email). |

---

### Cleaned Dataset Output

The cleaned data has been written back to `data/customers.csv` and contains the following records:

csv
name,email,phone,country,email_invalid
John Smith,john@example.com,+14155551234,US,False
jane doe,jane@example.com,+14155559999,US,False
Bob Wilson,bob@@invalid,+442071234567,GB,True
María García,maria@test.com,+34612345678,Spain,False
alice brown,alice@example.com,,US,False
Charlie Lee,charlie@test.com,+61412345678,AU,False

plaintext

Building Real-World Agents: A few Use Cases

Now that you understand the foundations, let’s build a few agents. Each one introduces new capabilities and patterns.

Use Case 1: Automated Code Refactoring & Test Fixing

The Problem: You have a Python project with outdated code and failing tests. A standard LLM gives you code snippets, but you still have to manually copy, paste, run tests, debug, and iterate.

The Antigravity Solution: The agent clones your code, runs the test suite, reads terminal errors, patches the code, and re-runs tests — looping until everything passes.

We are providing a repository here with buggy code. Take a look:

GitHub - rominirani/uc1-legacy-app: Managed Agents Sample App Repository with Bugs

Known Bugs to fix for the Agent

As per the repository and you can take a look at it too, there are 3 root-cause bugs across the codebase (causing 6 test failures):

utils.py — calculate_total() does arithmetic on a string tax_rate without converting to float.
utils.py — normalize_name() crashes on None input (no null check).
models.py — Invoice.summary() references self.total (method object) instead of self.total() (method call).

The Code

Let’s write our Agent that can fix this for us. Check out the instruction in detail below:

from google import genai
client = genai.Client()

system_instructions = """
You are an expert QA and Refactoring Engineer. Your workflow is:
1. Clone the target repository into the workspace
2. Install all dependencies from requirements.txt
3. Run the full pytest suite and capture all output
4. For each failing test:
   a. Read the error traceback carefully
   b. Open the source file causing the failure
   c. Diagnose the root cause
   d. Apply the minimal fix
5. Re-run pytest after each fix to confirm resolution
6. Continue until ALL tests pass (0 failures)
7. Generate a summary of every change you made
IMPORTANT: Make the minimal possible change to fix each test.
Do not refactor working code unnecessarily.
"""
interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    system_instruction=system_instructions,
    input="""
    Clone this repository: https://github.com/rominirani/uc1-legacy-app
    Run the pytest suite to check for environment or code failures.
    Isolate the bugs causing test failures and fix them one by one.
    Keep running the tests until you achieve a 100% pass rate.
    When done, provide:
    1. A list of all files modified with a diff summary
    2. The final pytest output showing all tests passing
    3. Any dependency changes you had to make
    """,
    environment="remote"
)
print(interaction.output_text)

plaintext

Run it: Save the above as uc1_refactor.py and run:

python3 uc1_refactor.py

shell

Expected output:

I have successfully diagnosed and resolved all the issues causing test failures in the `uc1-legacy-app` repository. Below is a detailed summary of the findings, modifications, and the final test execution output.

---

### 1. Modified Files & Diff Summary

#### File: `src/models.py`
- **Issue** : In the `Invoice.summary()` method, the `self.total` method itself was passed to `format_currency()` as an object, instead of being invoked (`self.total()`). This led to a `TypeError` during format evaluation.
- **Fix** : Changed `self.total` to `self.total()`.
- **Diff Summary** :

diff
@@ -23,7 +23,7 @@ class Invoice:
"""Return a one-line summary of the invoice."""
from src.utils import format_currency, normalize_name
name = normalize_name(self.customer_name)

return f"Invoice {self.invoice_id}: {name} - {format_currency(self.total)}"
return f"Invoice {self.invoice_id}: {name} - {format_currency(self.total())}"


#### File: `src/utils.py`
- **Issues** :
1. **`calculate_total()`**: When `tax_rate` is received as a string, Python fails to perform multiplication of float/string (`TypeError: can't multiply sequence by non-int of type 'float'`).
2. **`normalize_name()`**: When passed a `None` value (common in CSV imports), the code raised `AttributeError: 'NoneType' object has no attribute 'strip'`.
- **Fixes** :
1. Explicitly cast `subtotal` and `tax_rate` to `float` prior to calculating the tax.
2. Add a safeguard to return an empty string (`""`) when `name` is `None`.
- **Diff Summary** :

diff
@@ -11,8 +11,10 @@ def calculate_total(subtotal, tax_rate):
Returns:
The total as a float rounded to 2 decimal places.
"""

tax = subtotal * tax_rate / 100
return round(subtotal + tax, 2)
subtotal_val = float(subtotal)
tax_rate_val = float(tax_rate)
tax = subtotal_val * tax_rate_val / 100
return round(subtotal_val + tax, 2)

def format_currency(amount):
@@ -36,6 +38,8 @@ def normalize_name(name):
Returns:
Cleaned, title-cased name.
"""

if name is None:
return "" return name.strip().title()


---

### 2. Dependency Changes
No additional packages or dependency changes were made. All required libraries from `requirements.txt` (consisting of `pytest>=7.0.0`) were successfully installed and utilized without modification.

---

### 3. Final Pytest Output
Running the full suite confirms that all 10 tests now pass:

plaintext
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-9.0.3, pluggy-1.6.0
rootdir: /uc1-legacy-app
plugins: anyio-4.13.0
collected 10 items

tests/test_models.py .... [40%]
tests/test_utils.py ...... [100%]

============================== 10 passed in 0.02s ==============================

plaintext

What Happens Behind the Scenes

The agent autonomously:

Runs git clone in the sandbox terminal
Runs pip install -r requirements.txt (installing into the sandbox's Python)
Runs pytest -v and reads the full output, including tracebacks
Opens the failing source files, identifies buggy logic
Writes patches, saves files, re-runs pytest
Loops until the output shows 0 failures

Key Concept Introduced: The Plan → Act → Observe loop in action. The agent doesn’t just generate code — it executes, reads real terminal output, and self-corrects.

Use Case 2: Live Competitive Intelligence Engine

The Problem: You need a deep competitive analysis of rival products. Manual web scraping is tedious, and static scrapers break constantly.

The Antigravity Solution: The agent uses its web browsing tools to search Google, crawls competitor websites, dynamically adapts to different page structures, writes a custom pandas script to structure the data, and exports a polished spreadsheet.

The Code

from google import genai
client = genai.Client()

system_instructions = """
You are a precise Business Intelligence Analyst. Your workflow:
1. Use web search to find current, real-time information
2. Navigate to official product/pricing pages for each competitor
3. Extract: pricing tiers, key features, target audience, notable limitations
4. Structure ALL findings into a pandas DataFrame
5. Export to both CSV and a formatted Markdown comparison table
6. Include the date/time of research and source URLs for every data point
Rules:
- Only report data you can verify from official sources
- If pricing isn't publicly available, note "Contact Sales" - don't guess
- Use USD for all pricing normalization
"""
interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    system_instruction=system_instructions,
    input="""
    Research the pricing models and core feature matrices of the top 3 alternative
    platforms to Vercel for frontend deployment:
    1. Netlify
    2. Cloudflare Pages
    3. Cloud Run
    For each, extract:
    - Free tier limits (bandwidth, builds, sites)
    - Pro/paid tier pricing and what it unlocks
    - Key differentiating features
    - Notable limitations or complaints from developer communities
    Create:
    1. A competitive_matrix.csv spreadsheet with all data
    2. A competitive_analysis.md report with a formatted comparison table
       and a "Recommendation" section at the bottom
    """,
    environment="remote"
)
print(interaction.output_text)

python

This produces the following output (a lovely report).

As a Business Intelligence Analyst, I have conducted a precise competitive analysis of the top three frontend deployment alternatives to Vercel: **Netlify** , **Cloudflare Pages** , and **Google Cloud Run**.

All findings have been structured using `pandas` and exported to `/` (the current working directory) as:
1. `competitive_matrix.csv` — The raw, structured dataset.
2. `competitive_analysis.md` — A comprehensive, reader-ready report including a detailed markdown comparison matrix, architectural breakdowns, limitation analyses, and a tailored recommendation framework.

All data has been verified against official vendor pricing, product, and developer documentation as of **June 9, 2026**. All currency values are normalized in **USD**.

---

### Core Finding Highlights

#### 1. Netlify
* **The Architectural Shift (April 2026):** Netlify made a massive strategic update to its Pro plans, moving from a seat-based model ($19/member/month) to a **flat $20/month per organization** [10]. This includes unlimited team members (Owners, Developers, Reviewers, and Git Contributors) [10], which is highly disruptive compared to Vercel's strict per-seat billing.
* **The Credits Trap:** Usage is now managed entirely via a unified credit-based model (300 credits/mo free, 3,000 on Pro) [2, 10]. However, credits are consumed fast: bandwidth costs 20 credits per GB [2, 10] (meaning Free tier is only **~15 GB max** and Pro is only **~150 GB max** if solely used for traffic). If Free tier credits run out, **all site traffic pauses immediately** (no auto-recharge is supported on Free) [2, 11]. 
* **Capabilities:** Highly integrated features like Netlify Database (managed Postgres via Neon) [4], Blob storage, Forms, and Auth [10]. Serverless timeout is 10s on Free/Personal [2] and 26s on Pro [2].

#### 2. Cloudflare Pages
* **Unmetered Freedom:** Genuinely **unlimited and unmetered static bandwidth and requests** across all tiers (including the $0 Free tier) [24, 25, 27].
* **The Pro Tier ($20/mo billed annually / $25/mo monthly):** Tied directly to Cloudflare's Workers Paid plan [25, 27]. It increases build limits from 500/mo to 5,000/mo, concurrent builds from 1 to 5, and the file-count ceiling per project from 20,000 to 100,000 assets [31] (unlocked via the `PAGES_WRANGLER_MAJOR_VERSION=4` env variable) [31].
* **Edge Isolates Constraints:** While running serverless code on Cloudflare's global network of 310+ cities is blazingly fast with zero cold starts [24, 33], it uses a **strict V8 edge runtime** [24]. Standard Node.js library modules (`fs`, `path`, native `crypto`) do not run natively [24], forcing developers to compile meta-frameworks like Next.js via edge adapters (e.g., `@cloudflare/next-on-pages` or OpenNext) [24] which restricts certain features like Incremental Static Regeneration (ISR).

#### 3. Google Cloud Run
* **Pure Container Portability:** Cloud Run runs any application, language, or compiled binary (Node.js, Python, Go, Rust) packaged in a standard Docker container [37, 39]. It runs full Node.js natively [39], ensuring 100% Next.js feature parity (ISR, Server Actions, dynamic image optimization) with zero edge-runtime workarounds [39].
* **No Monthly Base Fee:** Operates on a pure pay-as-you-go serverless model (billed in granular vCPU-seconds and GiB-seconds) [41], backed by a generous Always Free tier (2 million requests/mo, 180,000 vCPU-seconds, 360,000 GiB-seconds, and 120 daily Cloud Build minutes free) [37, 38, 41]. 
* **The Infrastructure Overhead:** Not a standard "push-to-deploy" platform. Developers must write Dockerfiles, configure Artifact Registry, and manage IAM and networking [37, 39]. It is regionally hosted; to get a global CDN with SSL mapping, you must set up Cloud CDN and a Global HTTP Load Balancer [37] (~$20-$30/mo base fee) and navigate container cold starts (2 to 10+ seconds) [37] and expensive internet egress rates ($0.085 - $0.12 per GB) [37].

---

### Structure of Generated Files

#### 1. `competitive_matrix.csv`
A raw, tabular CSV structured for business intelligence pipelines, featuring the following schema:
- `Platform Name`
- `Free Tier Pricing (USD)`
- `Free Tier Bandwidth Limit`
- `Free Tier Build Limit`
- `Free Tier Sites/Projects Limit`
- `Free Tier Key Specs & Features`
- `Paid/Pro Tier Price (USD)`
- `Paid/Pro Tier Features Unlocked`
- `Key Differentiating Features`
- `Notable Limitations & Developer Complaints`
- `Date of Research`
- `Source URLs`

#### 2. `competitive_analysis.md`
A professionally formatted report including:
* **Executive Summary:** Contextualizing why frontend teams are exploring alternatives to Vercel in 2026.
* **Structured Comparison Table:** Transposed for high-density reading and side-by-side comparison.
* **Deep-Dive Section:** Dedicated analyses of the Free limits, paid features, unique differentiators, and community complaints for Netlify, Cloudflare Pages, and Google Cloud Run.
* **Architectural Recommendation Flowchart & Matrix:** Tailored decision pathways based on project profile:
  * *Static / JAMstack / Zero-Budget* $\rightarrow$ **Cloudflare Pages**
  * *Collaborative Teams / Rapid Prototyping / Seat-Free DX* $\rightarrow$ **Netlify**
  * *Complex Server-Side Rendering / Backend Pipelines / Custom Docker* $\rightarrow$ **Google Cloud Run**

---

### Source Verification & Citations

1. **Netlify:**
   - [Official Pricing & Plans](https://www.netlify.com/pricing/)
   - [Introducing Seat-Free Collaborative Billing (April 14, 2026)](https://www.netlify.com/blog/pricing-netlify-for-3-billion-builders/)
   - [Credit-Based Pricing Documentation](https://docs.netlify.com/manage/accounts-and-billing/billing/billing-for-credit-based-plans/credit-based-pricing-plans/)
   - [Netlify Free Tier Credit Analysis (Temps)](https://temps.sh/compare/vs-netlify)

2. **Cloudflare Pages:**
   - [Workers & Pages Pricing Matrix](https://www.cloudflare.com/plans/developer-platform/)
   - [Official Pages Limits Documentation](https://developers.cloudflare.com/pages/platform/limits/)
   - [Cloudflare Pages Platform Features](https://pages.cloudflare.com/)
   - [Cloudflare Pages Edge Limitations (Temps)](https://temps.sh/compare/vs-cloudflare-pages)

3. **Google Cloud Run:**
   - [Official Cloud Run Pricing Breakdown](https://cloud.google.com/run/pricing)
   - [Google Cloud Free Tier Inclusions](https://cloud.google.com/free)
   - [Cloud Run Quotas & Limits](https://docs.cloud.google.com/run/quotas)
   - [Cloud Run Cost Optimization & Egress Guide (Cloudchipr)](https://cloudchipr.com/blog/cloud-run-pricing)

The generated files are saved directly in your working directory and are immediately available for download or integration into your reporting pipelines. Let me know if you would like me to modify the analysis parameters or explore a specific platform further!

plaintext

Building on the Analysis (Multi-Turn)

You can do a follow up too, by using the previous environment and interaction to take the competitive_matrix.csv generated and creating some visualization out of it. While the code below retrieves only the text output, you can use the technique shown below to download the environment sandbox and extract out the files generated.

# Follow-up: generate a visual chart from the data we just collected
interaction_2 = client.interactions.create(
    agent="antigravity-preview-05-2026",
    environment=interaction.environment_id,
    previous_interaction_id=interaction.id,
    input="""
    Using the competitive_matrix.csv you just created:
    1. Create a grouped bar chart comparing the free tier limits across all 3 platforms
    2. Create a pricing comparison chart for the paid tiers
    3. Save both charts as PNG files with clear labels and a professional color scheme
    4. Add the charts as embedded images in the competitive_analysis.md report
    """
)
print(interaction_2.output_text)

python

Key Concepts Introduced: Web browsing + code execution in the same task. The agent searches the live internet, then writes Python code to structure and visualize the data — all in one flow.

More Use Cases

I plan to keep adding to this document with more use cases, as this solution matures with more features. Stay tuned for updates.

Best Practices

Here are the architectural patterns that will serve you well in production:

Build to Delete

Model performance improves rapidly. Design your agents with the expectation that they’ll be rebuilt with newer models soon. Keep configurations modular — AGENTS.md + SKILL.md files are easy to swap. Don't over-engineer the orchestration layer.

Treat Agents as Microservices

Don’t build one massive agent with a 2,000-word system prompt. Decompose complex problems into specialized sub-agents:

Errors as Inputs, Not Crashes

In robust agentic architectures, errors are data and are inputs for the agent to self-reflect and correct. The agent reads a TypeError, reasons about the cause, and fixes it. Don't wrap everything in try/except blocks that swallow errors. Let the agent see them.

Evals Over Unit Tests

Agent behavior is non-deterministic. Testing focuses on evaluation metrics rather than exact output matching:


# ❌ Don't: Assert exact output
assert agent_output == "The answer is 42"

# ✅ Do: Evaluate behavioral success rate
results = [run_agent(task) for _ in range(20)]
success_rate = sum(1 for r in results if r.meets_criteria) / len(results)
assert success_rate >= 0.85 # Agent succeeds at least 85% of the time

Token Cost Awareness

Because Antigravity operates an autonomous reasoning loop, a single prompt can trigger many internal operations. Expect something around this ball park:

Implement strict timeouts and monitor trace lengths during development. Use system_instruction to tell the agent to be concise when you don't need verbose reasoning.

Error Handling Patterns

Resources

Official Documentation: ai.google.dev/gemini-api/docs/custom-agents
Gemini Managed Agents: Developer Guide : https://www.philschmid.de/gemini-managed-agents-developer-guide
Video: Managed Agents Deep Dive: youtube.com/watch?v=Psa8mLikdag
Video: Getting Started Tutorial: youtube.com/watch?v=0YXe7u-i1qU
Google AI Studio: aistudio.google.com

Ideas for Your Next Agent

CI/CD Pipeline Agent — monitors your GitHub repo, runs tests on every push, opens fix PRs for failures
Documentation Generator — reads your codebase and generates comprehensive API docs
Data Pipeline Orchestrator — connects to your warehouse, runs transformations, validates output
Customer Support Triage Agent — reads tickets, categorizes severity, drafts responses, escalates critical issues
Research Assistant — takes a topic, searches papers and articles, writes a literature review with citations

Final Thought

Managed Agents represent a fundamental shift in how we build with AI. You’re no longer building the infrastructure around the AI, you’re simply giving the AI a goal and a sandbox, and letting it work. The sandbox is the abstraction. The agentic loop is the engine. And the Interactions API is your single point of contact with all of it.

Most importantly, start small and set limits on what the agent can do autonomously. Then scale up.

Happy building.

Top comments (1)

Rod Miller • Jun 10

I agree because non-deterministic behavior needs statistical measurement, not exact-match assertions. I run 340+ benchmarks across 88 models at tabverified.ai, scored by an independent judge model. Same approach you're describing here, just at scale and provider-independent. If you're shipping agents into production using Antigravity or anything else, the verification API is $0.01 per lookup.