DEV Community: Patricio Navarro

Agent-CLI: What is it, and how to use it from Prompt to Production

Patricio Navarro — Fri, 08 May 2026 10:43:43 +0000

How to use the new Agents CLI to bridge the gap between AI POCs and production-ready systems on Google Cloud.

The Problem

As a Cloud GDE, I was recently granted early access to an unreleased Google tool to test before its official launch. I decided to spend a couple of hours this Sunday having some fun putting it through its paces, specifically looking at how it handles the most painful bottleneck in AI engineering today: graduating from a prototype to production.

If you've been working with agentic AI recently, you already know the story. Building a simple standalone Python script that calls an LLM is easy. But what happens when you need to take that Proof of Concept (POC) into a production environment?

Suddenly, you are wrestling with Terraform scripts, spinning up CI/CD pipelines, configuring Secret Management, and writing evaluation harnesses so you don't push hallucinating agents into production. You end up spending more time writing boilerplate infrastructure than focusing on the actual solution design.

To solve this, Google presents Agents CLI.

What is the Agents CLI?

The Agents CLI is the definitive, machine-readable programmatic interface for the Agent Development Lifecycle (ADLC). Think of it as a unified "assembly line" for building, evaluating, and deploying AI agents to the full Google Cloud Agent Platform (Vertex AI, Cloud Run, etc.).

More importantly, it is built to empower both Human developers and AI Coding Agents. Whether you are using Gemini CLI, Cursor, or Antigravity, the Agents CLI provides your AI assistant with a direct, predictable interface to the entire cloud platform.

My Test Drive: The Finance Agent

To really see how the CLI performs under pressure, I started to build a simple finance-agent, designed to assess user risk tolerance and fetch mock stock data. I wanted to test the balance between developer control and AI autonomy.

For the test, I split my workflow 50/50:

Human Mode: Half the time, I acted as the architect, directly running deterministic commands in my console in a declarative way (e.g., executing agents-cli init or agents-cli test).
AI Autonomy: The other half, I relied entirely on the specific skills provided in Antigravity, letting the AI coding agent invoke the CLI programmatically to write the internal logic and run local servers.

This hybrid approach felt like a superpower. You can lean on the AI to generate the heavy lifting, but the CLI gives you the deterministic "Human Mode" execution whenever you need to step in and tighten the pipeline.

Core Benefits & Features

1. Zero-to-One Boilerplate (`init` & `enhance`)

With a traditional build, setting up an Agent Development Kit (ADK) repository takes hours. With the Agents CLI, it is instantaneous. By running the init command, you generate a fully formed repository structure.

# Scaffold the standard ADK template using automatic defaults
$ agents-cli init finance-agent -y --agent adk --deployment-target cloud_run

This generates a clean, scalable architecture ready for development:

finance-agent/
├── app/               # Core agent code (agent.py, fast_api_app.py)
├── tests/             # Unit and integration tests
├── GEMINI.md          # AI-assisted development context
└── pyproject.toml     # Dependencies managed strictly by uv

However, a raw application is not production-ready. Once your base logic is scaffolded, you can run agents-cli enhance to bolt on deployment configurations and CI/CD pipelines without rewriting the core application.

$ agents-cli enhance

This immediately injects the necessary enterprise structures into your project:

finance-agent/
├── .cloudbuild/       # CI/CD pipeline configurations for Cloud Build
├── deployment/        # Terraform infrastructure scripts

2. The Evaluation Harness (`eval` & `compare`)

You cannot run AI in production without strict quality control. The Agents CLI comes with a built-in evaluation methodology using LLM-as-a-judge and trajectory scoring.

You simply populate your tests/eval/evalsets/ folder with specific scenarios and ground-truth datasets (e.g., verifying the 50/30/20 budgeting rule or emergency fund logic), and run the pipeline:

$ agents-cli eval --evalset tests/eval/evalsets/finance.evalset.json

Eval Run Summary
finance_eval:
  Tests passed: 5
  Tests failed: 0

This ensures a data-driven approach by tracking the exact regressions and improvements between runs, completely avoiding the "it works on my machine" problem when interacting with non-deterministic LLMs.

3. Deployments & Infrastructure (`infra prod` & `deploy`)

Instead of writing custom YAML for Google Cloud Build from scratch or fighting with manual IAM permissions, you simply tweak the Terraform configuration files generated earlier by agents-cli enhance, and then apply them directly to Google Cloud:

# Provision analytical BigQuery datasets, artifact registries, and service accounts
$ agents-cli infra prod

# Containerize the codebase and push to a live Cloud Run service
$ agents-cli deploy

These two commands seamlessly handle Dev/Prod environment separation, Secret Management, and service endpoints, pushing your local codebase securely to Cloud Run or the Agent Engine without the headache.

Integration via Skills: Arming Your Coding AI

Before we wrap up, it's crucial to understand how deeply this integrates with modern AI development environments. Google also developed these skills I had the chance to test during this early preview, intentionally built to inject specific technical context natively into IDEs like Cursor, Gemini CLI, and Antigravity.

Instead of searching the web for missing agent-cli documentation (or worse, hallucinating unsupported Python code), your coding agent gains exact contextual awareness directly through these bundled skills:

/agents-cli-workflow: Provides the agent with the definitive development lifecycle, code preservation rules, and model selection.
/agents-cli-adk-code: The master Python API reference containing exactly how to write tools, orchestration, state, and callbacks within the environment.
/agents-cli-scaffold: Teaches the AI how to use the CLI to scaffold a project (init).
/agents-cli-eval: Injects evaluation methodology (metrics, evalsets, LLM-as-judge).
/agents-cli-deploy: Gives the AI instructions on CI/CD rules, secrets management, and pushing to Cloud Run via the CLI.
/agents-cli-publish: How to register the finalized agent to Gemini Enterprise.
/agents-cli-observability: Guides the AI to set up Cloud Trace, BigQuery exports, and Cloud Logging integrations natively.

Real-World Gotchas & Teardown

As smooth as the workflow is, it's still cloud engineering. During my test drive, I hit a classic GCP security hurdle:

The 403 Forbidden Error: Immediately post-deployment, my endpoint returned a 403 error. This isn't a bug; Cloud Run secures endpoints against unauthenticated access by default. I had to manually apply a gcloud IAM policy binding (roles/run.invoker to allUsers) to expose the agent's endpoint properly.
Full Teardown (terraform destroy): When you are done experimenting, don't leave your bill running. Because agents-cli organizes infrastructure cleanly under deployment/, I was able to completely deprovision my active Cloud footprint (BigQuery, Storage, Service Accounts) by executing a standard terraform destroy. (Note: you will still need to manually delete your generated GitHub repository!).

Conclusion

The new Agents CLI completely redefines the "zero-to-one" experience for AI data products. It strips away the friction of setting up infrastructure, enforces standard data engineering practices, and bridges the gap between unreliable proof-of-concepts and enterprise-grade systems.

Whether you are using the CLI manually in your terminal or letting Antigravity drive it dynamically, it ensures your code is clean, evaluated, and ready for production before it even leaves your local machine.

Building a Production-Ready Serverless App on Google Cloud (Part 2: The Data Contract)

Patricio Navarro — Sun, 05 Apr 2026 16:46:41 +0000

This is Part 2 of a 3-part series on building production-ready, data-intensive applications on Google Cloud. If you haven't read it yet, check out Part 1: Architecture to understand the foundational serverless components we are connecting today.

The Danger of Decoupling

In Part 1 of this series, we praised the decoupled architecture. By splitting our compute (Cloud Run) from our analytics (BigQuery) using a buffer (Pub/Sub), we created a system that scales infinitely and costs nothing when idle.

But decoupling introduces a massive architectural danger: The Data Swamp.

If your web application can throw any random JSON payload into a Pub/Sub topic, and that topic blindly dumps it into a data warehouse, your analytics team will spend 80% of their time cleaning malformed strings and fixing broken dashboards.

To prevent this, we must establish a strict Data Contract at the very edge of our ingestion layer.

The Bouncer: Enforcing the Pub/Sub Schema

A professional data pipeline does not rely on the application code to "hopefully" send the right data types. It enforces rules at the infrastructure level.

For the Dog Finder app, we attached a strict Apache Avro schema to our Pub/Sub topic. This acts as the "bouncer" for our data warehouse. If Cloud Run attempts to publish a sighting with a missing field or the wrong data type, Pub/Sub rejects it immediately.

By inspecting pubsub_schema.json, you can see standard Data Engineering practices enforced natively:

Precision Typing: We explicitly defined latitude and longitude as double precision. This prevents the backend from accidentally sending coordinates as strings, which would break spatial queries later.
Consistent Naming: We enforced snake_case for all fields, such as sighting_date and image_url.

The Vault: Designing the BigQuery Schema

BigQuery is where our data lives permanently. The schema here needs to mirror our Pub/Sub contract, but also provide the metadata necessary for reliable analytics.

If you look at bigquery_schema.json, we didn't just copy the business fields. We intentionally included metadata fields like message_id and publish_time. Because Pub/Sub guarantees "at-least-once" delivery, duplicate messages can occasionally occur. Capturing the message_id is essential for the analytics team to efficiently deduplicate records.

More importantly, we didn't just create a basic table. In our setup_resources.sh script, we enforced a partitioning strategy directly at creation:

bq mk --table \
    --time_partitioning_field sighting_date \
    --time_partitioning_type DAY \
    "${GOOGLE_CLOUD_PROJECT}:${BIGQUERY_DATASET}.${BIGQUERY_TABLE}" \
    "$PROJECT_ROOT/schemas/bigquery_schema.json"

By partitioning the table by sighting_date, we ensure that when a Looker Studio dashboard queries for "lost dogs this week" or an analyst performs research, BigQuery scans only the relevant daily partitions. This single command is the difference between a query that costs $1 and a query that costs $1,000 as your dataset grows.

The Serverless Bridge: Zero-Code Ingestion

Now for the architectural magic trick. We have a secure Pub/Sub topic and a partitioned BigQuery table. How do we move data between them?

Traditionally, developers write a Cloud Function or spin up a Dataflow job to consume from Pub/Sub, transform the payload, and insert it into BigQuery. That means writing code, managing deployments, and paying for intermediate compute.

Instead, we used a native BigQuery Subscription. This is a powerful serverless pattern that requires zero code. Here is the exact command from our setup script:

gcloud pubsub subscriptions create "$SUBSCRIPTION_ID" \
    --topic="$TOPIC_ID" \
    --bigquery-table="${GOOGLE_CLOUD_PROJECT}:${BIGQUERY_DATASET}.${BIGQUERY_TABLE}" \
    --use-topic-schema \
    --write-metadata \
    --project="$GOOGLE_CLOUD_PROJECT"

Notice the two critical flags:

--use-topic-schema: This tells the subscription to natively map the fields from our Avro schema directly to the BigQuery columns.
--write-metadata: This automatically populates those message_id and publish_time fields we added to our BigQuery schema for auditing.

Designing for Failure: The Dead Letter Topic (DLT)

But an architect must always design for failure. What happens if a schema evolution causes a mismatch, or BigQuery temporarily rejects an insert? By default, Pub/Sub will continually retry the delivery, but once the retention period or retry limit is exhausted, that message is dropped forever. Data loss in a production pipeline is unacceptable.

To prevent this, we must configure a Dead Letter Topic (DLT) alongside our subscription. This is a core defensive engineering practice.

By adding the --dead-letter-topic and --max-delivery-attempts flags to your subscription configuration, you create a safety net. If a message fails to write to BigQuery after, say, 5 attempts (perhaps due to an unforeseen schema mismatch), Pub/Sub automatically routes that specific message to the DLT and continues processing the rest of the queue.

Instead of losing the sighting, the malformed data is safely quarantined. You can set up an alert on the DLT, inspect the failing payload, patch your schema or application code, and then easily replay the dead-lettered message back into the main pipeline. Zero dropped records, zero panic.

With this configuration, GCP handles all the plumbing. As soon as the Cloud Run backend publishes a validated event to Pub/Sub, the infrastructure automatically streams it into BigQuery - securely and resiliently - with absolutely zero intermediate compute costs.

Conclusion

By enforcing a Data Contract via an Avro schema and utilizing native BigQuery subscriptions, we eliminated the "glue code" that normally plagues data pipelines. Our analytics team gets perfectly structured, partitioned data, and our application developers don't have to manage a single ingestion worker.

Building a Production-Ready Serverless App on Google Cloud (Part 1: Architecture)

Patricio Navarro — Tue, 31 Mar 2026 20:52:09 +0000

The Problem

In my previous post, I shared how I used an AI agent framework during a train ride to build a Proof of Concept (POC) for a project called the Dog Finder App. The response was great, but the experiment raised a technical question: How do you build a POC quickly without creating a messy monolith that you'll have to rewrite later?

When building a data-intensive application, engineers usually face a harsh trade-off. You can either build it fast to prove the concept (and inherit massive technical debt), or you can build it "right" (and spend weeks provisioning infrastructure and writing boilerplate).

By leveraging serverless services on Google Cloud Platform (GCP), we can break that trade-off.

This is the first in a three-part series where I will show you how to architect, automate, and deploy a complete, decoupled data application. We will look at how combining serverless tools with strict Data Engineering practices allows you to spin up a solution that is both incredibly fast to build and ready for production traffic.

The Architecture: Decoupling by Default

In traditional POCs, it is common to see a tightly coupled monolith: a single backend service receiving HTTP requests, saving images to a local disk, writing state to a database, and running heavy analytical queries. If one component bottlenecks, the entire application crashes.

For the Dog Finder app—a system designed to ingest real-time sightings of lost dogs and route them for geographical analysis—we needed a system that scales instantly under load but costs absolutely nothing when there is no traffic.

To achieve this, we default to a decoupled architecture. We split the ingestion, state, and analytics across specialized, managed serverless components:

Google Cloud Run (Compute): Hosts our stateless Flask web application and API. It handles incoming user traffic, scales up automatically on demand, and drops to zero when idle.
Google Cloud Storage (Blob Storage): Handles the heavy payloads. User-uploaded images of dogs go straight here, keeping our databases lean and performant.
Firestore (Operational Database): Our OLTP layer. This NoSQL database stores the real-time state of the application, allowing the frontend to read and display current sightings with millisecond latency.
Cloud Pub/Sub (Ingestion Buffer): The shock absorber of our system. When a sighting occurs, the backend publishes an event here and immediately responds to the user, completely decoupling the web app from the analytics pipeline.
BigQuery (Data Warehouse): Our OLAP layer. The final destination where all structured sightings land for historical storage, regional partitioning, and complex analytical querying.

The Compute Layer: Scaling to Zero with Cloud Run

At the core of the Dog Finder app is a Python Flask backend. In a traditional setup, you would provision a Virtual Machine (Compute Engine) to run this application. You’d pay for that VM 24/7, even at 3:00 AM when no one is reporting lost dogs.

Instead, we containerized the application using a standard Dockerfile and deployed it to Google Cloud Run.

Cloud Run is a fully managed compute platform that automatically scales stateless containers. As an architect, enforcing statelessness is critical here. The Flask app does not store any session data or images on its local filesystem. Its only job is to act as a highly efficient traffic cop:

Receive the HTTP POST request (the sighting payload and image).
Validate the data payload.
Offload the heavy lifting to our specialized data services.
Return a success response to the user.

If there is a massive spike in lost dog reports, Cloud Run spins up multiple container instances instantly. When traffic drops, it scales down to zero. We only pay for the exact number of milliseconds the CPU spends processing a request.

The Data Split: OLTP vs. OLAP in a Serverless World

This is where many rapid POCs turn into unmaintainable monoliths. A common mistake is throwing all your data—images, real-time app state, and analytical history—into a single relational database like PostgreSQL. As the application grows, database locks increase, queries slow down, and storage costs skyrocket.

To prevent this, we split the data path into three specialized lanes:

1. Cloud Storage (The Payload)

Databases are expensive places to store binary files. When a user uploads a photo of a dog, our Cloud Run app sends that file directly to a Google Cloud Storage bucket. The app then grabs the resulting image_url and uses that string for the rest of the data pipeline. This keeps our databases incredibly lean and fast.

2. Firestore (Operational / OLTP)

Users expect a snappy UI. When they open the app, they want to see the latest dog sightings immediately. We use Firestore (a NoSQL document database) as our operational layer. After saving the image, Cloud Run writes the sighting record to Firestore. This provides low-latency reads and writes, ensuring the web frontend feels instantaneous without running complex SQL joins.

3. BigQuery (Analytical / OLAP)

While Firestore is great for the UI, it is not designed for heavy aggregations (e.g., "How many Golden Retrievers were lost in the northern region last month compared to last year?").

For this, we route the data to BigQuery. We explicitly partitioned the BigQuery table by sighting_date. This is a crucial Data Engineering standard: when analysts query the table for recent trends, BigQuery only scans the relevant partitions, drastically reducing query costs and execution time.

The Payoff: Visualizing the Data (Looker Studio)

An architecture is only as good as the insights it delivers. The real payoff of this decoupled, partitioned setup became obvious when I wanted to add a visualization layer.

Because we cleanly separated our operational state (Firestore) from our analytical history (BigQuery), I was able to connect Looker Studio directly to the BigQuery table in minutes. I didn't have to worry about complex API integrations or degrading the performance of the live web app.

I created a real-time dashboard that plots the sightings by region. As new records flow through the serverless pipeline, the dashboard updates automatically, providing a live heat map of lost dog hotspots. This transforms the POC from a simple "data entry" app into a complete, end-to-end data product.

Conclusion & What’s Next

In this first part, we laid out the "boxes" of our architecture. By leveraging Cloud Run, Cloud Storage, Firestore, and BigQuery, we designed a system that scales instantly, costs nothing when idle, and handles both operational and analytical workloads perfectly.

But having the right boxes is only half the battle. How do we connect them reliably?

In Part 2, we will dive into the lines connecting the boxes. I will show you how to use Pub/Sub to fully decouple ingestion, how to set up a direct serverless subscription from Pub/Sub to BigQuery (no code required), and how to enforce strict Data Contracts so your beautiful data warehouse doesn’t turn into a data swamp.

Testing Antigravity: Building a Data-Intensive POC at 300km/h

Patricio Navarro — Sun, 22 Mar 2026 19:18:05 +0000

Introduction

Last week, I spent a few hours on a Frecciarossa train from Rome to Calabria. Usually, this is time spent catching up on emails, but I decided to use the journey to stress-test Antigravity for code development.

As a Google GDE and Data Engineer, I’m always looking for ways to streamline the "zero-to-one" phase of a project. My objective was specific: Build a functional, data-intensive Proof of Concept (POC) that I could eventually use in a GDE workshop or technical presentation.

The Smoke Test

Before trusting an AI framework with my GCP environment, I started by running through some of the more complex Antigravity examples. I wanted to see if the agent could handle intricate logic and performance-sensitive code without "hallucinating" or breaking under pressure. Once it proved it could handle high-level orchestration and optimization in these isolated tests, I knew it was ready for a real-world Data Engineering pipeline.

The Objective

The project I set out to build is an intensive data application called "Dog Finder". The goal was to create a system that could handle real-time sightings of lost dogs, process them through a reliable pipeline, and land them in a data warehouse for analysis.

The final architecture consists of:

Frontend/Backend: A Flask application deployed on Google Cloud Run.
Ingestion: A Pub/Sub topic with a strict schema to ensure data quality at the entry point.
Storage/Analytics: A BigQuery dataset with a table partitioned by sighting_date for cost-effective querying.
Automation: Fully idempotent shell scripts for resource provisioning and cleanup.

The Workflow: From Coder to Conductor

Working with Antigravity felt less like traditional coding and more like leading a team of mid-level developers. I was the Architect, and the AI was the execution arm.

The Proactive "Wins"

One of the most interesting aspects of the experience was the AI’s "propropositive" nature. Sometimes it suggested paths I hadn't explicitly asked for, but that added immediate value. For instance, while we were building the documentation, it suggested generating a Mermaid architecture graph directly in the README. It was a "nice-to-have" that I ended up keeping because it made the repo much more professional for a workshop setting.

The "Experience" Corrections

However, "AI-driven" doesn't mean "autopilot." I frequently had to use my experience to correct the course. In the initial infrastructure scripts, the AI took some "happy path" shortcuts that wouldn't fly in a real environment. I had to explicitly step in to enforce Data Engineering standards:

Idempotency: I guided the agent to ensure setup_resources.sh wouldn't crash if a bucket or topic already existed.
Schema Integrity: I enforced snake_case and double precision for coordinates to prevent downstream data issues in BigQuery.
Refactoring: I instructed the AI to reorganize the project—moving scripts to /scripts and schemas to /schemas. Once the instruction was clear, the AI executed the refactor across the entire project flawlessly.

From POC to "Almost Prod-Ready"

The most impressive part of this experience was the velocity. What I initially planned as a simple POC evolved so quickly that I spent some time at home after my trip hardening it into an almost production-ready state.

Funny Fact: I was doing all of this while traveling through the tunnels of the Italian countryside. I was constantly losing my 5G connection as the train sped along. If I managed to build and deploy a full GCP data pipeline while dealing with intermittent connectivity, imagine what you can achieve with a stable fiber connection.

The Verdict

If you are an experienced developer, Antigravity is a superpower. It allows you to focus 100% of your energy on solution design and architectural tuning. You can move fast because you already know what "good" looks like and can spot the shortcuts the AI might try to take.

For junior developers, my advice is to go easy. It allows you to arrive at a working result very quickly, but "working" isn't always "ideal." Use it to learn, but always question the architectural choices it makes for you.

You can check out the full project and the result of this high-speed experiment here:

patricio-navarro / dog_finder_app

Anlytics POC

🐶 Dog Finder Analytics POC

📋 Overview

This application allows users to report lost dog sightings. It captures the location (via Google Maps), date, and a photo. The data is processed by a Flask backend, authenticating users via Google OAuth, storing images in GCS, persisting user and sighting data in Firestore, and publishing event data to Pub/Sub for analytics in BigQuery.

✨ Features

Frontend: Responsive, premium-styled UI with Google Maps integration.
Authentication: Secure Google OAuth 2.0 Login with session management.
Data Persistence: Firestore database for Users and Sightings.
Cloud Integration: Google Cloud Storage (Images) and Pub/Sub (Events).
Deployment: Dockerized and ready for Cloud Run.

🏗️ System Architecture

flowchart TD
    User([User]) <--> Client["Frontend (Flask/Jinja)"]
    Client -- "OAuth 2.0" --> Auth["Google Identity Services"]
    Client -- "Submit Sighting (POST)" --> Backend["Flask Backend"]
    subgraph "Google Cloud Platform"
        Backend -- "Store Image" --> GCS["Cloud Storage"]
        Backend -- "Persist Data" -->

…

View on GitHub

DEV Community: Patricio Navarro

Agent-CLI: What is it, and how to use it from Prompt to Production

The Problem

What is the Agents CLI?

My Test Drive: The Finance Agent

Core Benefits & Features

1. Zero-to-One Boilerplate (init & enhance)

2. The Evaluation Harness (eval & compare)

3. Deployments & Infrastructure (infra prod & deploy)

Integration via Skills: Arming Your Coding AI

Real-World Gotchas & Teardown

Conclusion

Building a Production-Ready Serverless App on Google Cloud (Part 2: The Data Contract)

The Danger of Decoupling

The Bouncer: Enforcing the Pub/Sub Schema

The Vault: Designing the BigQuery Schema

The Serverless Bridge: Zero-Code Ingestion

Designing for Failure: The Dead Letter Topic (DLT)

Conclusion

Building a Production-Ready Serverless App on Google Cloud (Part 1: Architecture)

The Problem

The Architecture: Decoupling by Default

The Compute Layer: Scaling to Zero with Cloud Run

The Data Split: OLTP vs. OLAP in a Serverless World

1. Cloud Storage (The Payload)

2. Firestore (Operational / OLTP)

3. BigQuery (Analytical / OLAP)

The Payoff: Visualizing the Data (Looker Studio)

Conclusion & What’s Next

Testing Antigravity: Building a Data-Intensive POC at 300km/h

Introduction

The Smoke Test

The Objective

The Workflow: From Coder to Conductor

The Proactive "Wins"

The "Experience" Corrections

From POC to "Almost Prod-Ready"

The Verdict

patricio-navarro / dog_finder_app

Anlytics POC

🐶 Dog Finder Analytics POC

📋 Overview

✨ Features

🏗️ System Architecture

1. Zero-to-One Boilerplate (`init` & `enhance`)

2. The Evaluation Harness (`eval` & `compare`)

3. Deployments & Infrastructure (`infra prod` & `deploy`)