Google Cloud’s Agent Ops Stack: Why Deployment Is No Longer the Hard Part

#devchallenge #cloudnextchallenge #googlecloud #vertexai

Google Cloud NEXT '26 Challenge Submission

This is a submission for the Google Cloud NEXT Writing Challenge

The Gemini Enterprise Agent Platform slide that opened Google Cloud Next'26 has four layers: Build, Scale, Govern, Optimise. Look at what is missing: Deploy.

That omission is not an oversight. It is the point. Deploy has not disappeared. In the platform's lifecycle it is handled as an automated background step via Agent CLI and Agent Runtime, part of Build and Scale. Google has made it a standardised process precisely so it stops being the primary engineering challenge. The hard questions are now upstream and downstream of it.

A year ago, the conversation in every enterprise AI session was "how do we run an agent?" Today, Thomas Kurian opened the Next '26 keynote by declaring the agentic enterprise "real — and deployed at a scale the world has never before seen," and announcing a platform designed to answer an entirely different question: how do we govern a fleet of thousands of them?

That shift, from deployment to governance, from experiment to operations, is what it actually means for agents to become first-class citizens on Google Cloud. It is a change in the platform's fundamental assumptions. We are leaving the era of the Request/Response cycle and entering the era of the Long-Lived Agentic Session. Infrastructure built for humans processing HTTP requests is being rebuilt for agents processing week-long workflows, with identity, memory, security, and observability treated as primitives rather than afterthoughts.

Here is what that looks like in practice.

From Vertex AI to an agent operations platform

The Gemini Enterprise Agent Platform is not a rebrand of Vertex AI. It is the evolution of it, and the distinction matters. Vertex AI gave engineers a trusted surface to build and tune models. The Agent Platform gives engineering teams a surface to manage agents as operational entities.

For the past two years, the industry has been consumed by the Dev Stack for agents: which LLM to use, how to write the perfect prompt, which RAG framework to pick. Google's announcement effectively says: the Dev Stack is largely solved. Let's talk about the Ops Stack.

The four pillars — Build, Scale, Govern, Optimise — are worth reading in the order Google chose, because that order tells you where the work is.

Build covers what most developers already expect: a graph-based Agent Development Kit (ADK) supporting Python, TypeScript, Java, and Go; a low-code Agent Studio; Agent Garden templates; and multimodal streaming. Google reports that over six trillion tokens are processed monthly through ADK alone. The model backbone for this platform is the Gemini 3 family: Gemini 3 Pro for complex workflow orchestration, Gemini 3 Flash for the high-frequency, lower-latency tasks that agent loops demand. The tooling here is mature. The interesting announcements are in the next three layers.

Scale is where the runtime gets serious. Agent Runtime now delivers sub-second cold starts. Long-running agents can maintain state for up to seven days. Agent Sandbox provides hardened execution environments for model-generated code and computer-use tasks. The key addition is Memory Bank with Memory Profiles: agents can now retain long-term, high-accuracy context across sessions, mapped to internal CRM and database records via Custom Session IDs. Stateful agents are not an edge case anymore; they are the runtime's default assumption.

Govern is the layer that signals the platform shift most clearly. Three new capabilities: Agent Identity, Agent Registry, and Agent Gateway.

Think of Agent Identity, Agent Registry, and Agent Gateway together as Active Directory for the AI era: the system that manages who your non-human workforce is, what it can access, and what it did. Agent Identity gives every agent a unique cryptographic ID with an auditable trail mapped to authorisation policies. If an agent takes an action, you know which agent, under which policy, at what time. This is not prompt engineering; it is IAM for non-human principals.

Agent Registry is a central catalogue of every agent and approved tool across your organisation — the equivalent of a container registry, but for agents. Whether the agent was built internally on ADK or sourced from the partner marketplace (Atlassian, Box, Salesforce, ServiceNow, Workday all launched agents at Next), it has one identity and one index.

Agent Gateway is described by Kurian as "air traffic control for your agent ecosystem." It routes all agent traffic, speaks both MCP and A2A natively, and applies Model Armor inline: prompt injection scanning and tool poisoning detection happen at the network layer before any agent action executes. Critically, it also surfaces Agent Anomaly Detection, monitoring for tool misuse, unauthorised data access, and reasoning drift in production.

Optimise closes the loop with Agent Simulation (generate thousands of synthetic interactions to surface edge cases before your users do), Agent Evaluation (multi-turn autoraters scoring live traffic), and OTel-compliant Agent Observability: automatic tracing, Agent Topology visualisation (a live map of how your agents interact with one another and with tools), and turn-key dashboards that surface the full reasoning chain behind every agent decision. If an agent chose the wrong tool or misread a user's intent, you can see exactly which step in the chain caused it — not just that something went wrong, but why. These are the SRE tools for agent fleets.

Taken together, this is not a developer stack. It is an ops stack.

Five platform changes that make agents genuinely first-class

It is easy to claim that agents are "first-class." The evidence is in whether the platform treats them as principals with rights and identities, not just processes with permissions.

On that test, five concrete things changed today.

First, agents now have cryptographic identities. Agent Identity means IAM, audit, and compliance can treat an agent as a principal rather than an extension of a human user. When an agent in your supply chain pipeline calls a Spanner instance or reads from BigQuery, that action is traceable to a specific agent with a specific policy scope. That is a meaningful governance primitive, not a feature flag.

Second, they route through a dedicated control plane. Agent Gateway is effectively an API gateway for agent traffic. Architecturally, this mirrors what happened when enterprises standardised on API gateways a decade ago: a chokepoint that enforces policy, provides observability, and decouples caller from callee. The fact that it speaks MCP and A2A natively means the gateway understands agent semantics, not just HTTP verbs. A Google Cloud engineering post published this month makes the underlying technical case: in agentic protocols, policy attributes live inside message bodies rather than headers, so any governance layer that does not parse MCP and A2A payloads is operating blind. Envoy, the proxy underpinning Agent Gateway, is built precisely for this.

Third, they have persistent managed memory. Memory Bank and Memory Profiles are now managed infrastructure, not application state you build yourself. The Gurunavi case study at Next described eliminating manual searches entirely by having agents recall past preferences across sessions. Payhawk's Financial Controller Agent reduced expense submission time by over 50% by remembering user-specific constraints. Stateful behaviour is no longer something you bolt on; it is something the platform provides.

Fourth, they have dedicated runtime economics. Sub-second cold starts and 300 sandboxes per second on GKE reflect a runtime optimised for agent workload patterns: bursty, parallel, potentially long-running, and needing isolation. The TPU 8i chip (Zebrafish), announced separately today, goes further: designed explicitly for the low-latency, chain-of-thought MoE inference that agent reasoning demands, with roughly 80% better performance-per-dollar than Ironwood on that workload.

Fifth, they have a dedicated observability and evaluation stack. OTel-compliant traces, simulation, and live autorater evaluation give engineers the same observability primitives for agents that SRE tooling gave them for services. You can now run a stress test against your agent fleet before deploying to production, score live traffic, and trace a failed reasoning chain end-to-end. That is the maturity signal.

What this means if you are building today

The most immediate implication: the boundary between model development and agent operations has moved. A year ago, you deployed a model and called it via an API. Today, you deploy an agent with an identity, a memory profile, a registered set of approved tools, and a gateway policy. The deployment step is the beginning of the operational lifecycle, not the end of the development one.

The deployment step is the beginning of the operational lifecycle, not the end of the development one.

The architectural mental model shift is significant: stop thinking about agents as wrappers around LLM APIs and start thinking about them as microservices — discrete, composable, independently deployable, and governed by the same infrastructure controls as the rest of your stack. ADK is the framework that makes that model practical.

For engineering leads, the Agent Registry changes the conversation about shadow AI. If every agent your organisation uses — internal or sourced from a partner marketplace — needs to be registered and assigned an identity, you have a forcing function for agent governance that does not depend on policy documentation or developer discipline. The infrastructure enforces it.

For platform teams, Agent Gateway as an MCP-and-A2A-aware control plane means you can start enforcing tool-level access control at the network layer. Restricting which tools a customer-facing commerce agent can call is now an infrastructure configuration, not a prompt constraint.

The commerce signal

One customer story from the keynote is worth isolating for what it signals about the direction.

Macy's unveiled "Ask Macy's," a Gemini-powered shopping agent built in four weeks using Gemini Enterprise for Customer Experience. Reliance demonstrated an agent planning a birthday party, processing millions of product images in minutes via Gemini catalogue enrichment. PayPal's Principal Engineer specifically called out Memory Bank and AP2 (Agent Payments Protocol) as the foundation enabling trusted, agentic commerce experiences on their platform.

The pattern across all three is the same: agents handling not just product discovery but multi-step, stateful, transactional workflows. An agent that can remember what you bought last month, understand your current budget, recommend products, and initiate a UCP checkout — that requires identity, memory, a governed tool set, and a payment layer that can verify authorisation cryptographically.

The Gemini Enterprise Agent Platform, announced today, provides the first three. AP2, which Google announced earlier this year and reaffirmed today via the PayPal integration, provides the fourth.

Commerce is not just a use case for this platform. It is the stress test. If agents can handle a stateful, multi-party, financially consequential transaction with full auditability, they can handle most enterprise workflows.

What comes next

If you are building agents on Google Cloud today, the practical advice is simple: register them in Agent Registry, assign them identities, route them through Agent Gateway, and instrument them with OTel traces. The platform now supports that workflow end-to-end. The question is not whether to govern your agents. At this point, the infrastructure assumes you will.

Tomorrow's developer keynote may add further detail on tool-level governance and Cloud Run specifics for long-running agent workloads. I will update as confirmed.

Sources: