DEV Community

Cover image for Vibes Don't Scale: Why "Agentic Plumbing" is the Future of Production AI
Md Azad
Md Azad

Posted on

Vibes Don't Scale: Why "Agentic Plumbing" is the Future of Production AI

Google Cloud NEXT '26 Challenge Submission

This is a submission for the Google Cloud NEXT Writing Challenge


Read enough keynote recaps and the shape of them becomes familiar: model names, benchmark numbers, and a CEO quote about whatever "era" we are in. You close the tab, write a Jira ticket, and wonder if any of it was actually about your job.

Today's Google Cloud NEXT '26 opening keynote had all of that. Thomas Kurian in Las Vegas. Sundar Pichai on video. Apple's logo unexpectedly behind the Google CEO's head. The big reveal of the Gemini Enterprise Agent Platform .

But amidst the applause for Gemini 3.1 Pro and its new "Deep Think" mode, I kept coming back to the announcements that weren't flashy.

The ones that will actually affect what we ship this year are the boring ones. The plumbing.


What I Mean By Plumbing

The boring infrastructure almost always decides whether a technology ships at scale. HTTP wasn't exciting. TCP/IP wasn't a keynote moment. Nobody clapped for DNS. But that's the layer where things either work reliably or don't work at all.

AI agents are at exactly that point right now. Everyone agrees on what they want agents to do. The part that has quietly killed hundreds of enterprise AI projects is different: getting agents to talk to each other across systems, hold context between sessions, and do it without becoming a security nightmare your team has to clean up later.

That's what Google actually shipped today. Dressed up in model demos and stage lighting, but the substance is infrastructure.

Thomas Kurian titled the keynote "The Agentic Cloud" and drew a deliberate contrast with competitors: other vendors, he said, are "handing you the pieces, not the platform," leaving teams to integrate components themselves .


Hardware for the AI Age: The TPU v8 Family

Agentic workflows are computationally expensive. When an agent enters a reasoning loop, it performs thousands of operations in the background. Standard hardware can't keep up with that demand without massive latency.

Enter the TPU v8 family. For the first time, Google has bifurcated its chips into two distinct versions :

TPU 8t (Training — codenamed "Sunfish")

  • 216 GB HBM memory | 128 MB on-chip SRAM | 12.6 FP4 petaflops
  • Scales to 9,600-chip superpods via Virgo fabric
  • 3x the processing of seventh-generation Ironwood TPU
  • 2x the performance per watt
  • New SparseCore accelerator handles "irregular memory access patterns" like embedding lookups
  • Built with Broadcom on TSMC's 2nm process

TPU 8i (Inference — codenamed "Zebrafish")

This is the more important chip for most developers because it reflects where cloud margins and customer retention will actually be decided: not in rare pretraining runs, but in the relentless economics of serving reasoning models and agents under latency SLAs .

  • 288 GB HBM memory | 384 MB on-chip SRAM (3x more than previous generations!)
  • 10.1 FP4 petaflops of computing capacity
  • 80% better performance per dollar than Ironwood
  • New Collectives Acceleration Engine (CAE) reduces latency by 5x for chain-of-thought reasoning
  • Boardfly ICI topology cuts network hops from 16 to 7 (50% latency improvement for communication-intensive workloads)

For my project, CrowdCommand — a crowd-safety platform for stadiums — the TPU 8i is vital. If an agent takes seconds to process a camera feed during a potential crush, it's useless. The 8i's on-chip KV cache and reduced latency make Gemini 3.1 Flash responses feel instantaneous.

Virgo Network: The Fabric That Ties It Together

Google also announced Virgo, a new high-bandwidth, low-latency interconnect fabric for its AI Hypercomputer :

  • Connects up to 134,000 TPU 8t processors
  • 47 petabits per second of bi-directional bandwidth
  • 1.6 million ExaFlops of capacity with "near-linear" scaling
  • 40% lower unloaded fabric latency for TPUs
  • Uses a flat, two-layer non-blocking topology with high-radix switches

This isn't just a network upgrade — it's a complete reimagining of how AI chips communicate. For multi-agent systems where agents need to stay in constant sync, Virgo prevents the lag that breaks distributed reasoning.


The A2A Protocol v1.0: Breaking the Vendor Lock-in

The Agent-to-Agent (A2A) protocol reaching v1.2 and moving to the Linux Foundation's Agentic AI Foundation is a landmark moment . It solves the "Multi-Agent Discovery" problem that has made distributed agent architectures genuinely painful.

How does an agent on Platform A discover, trust, and delegate to an agent on Platform B?

The answer is Signed Agent Cards.

What Agent Cards Actually Are

Every A2A-compliant agent publishes an Agent Card: a JSON manifest served at /.well-known/agent.json that declares what the agent can do, what inputs it accepts, what auth schemes it supports, and how to reach it :

{
  "name": "Procurement Agent",
  "version": "1.2.0",
  "capabilities": ["create_purchase_order", "check_vendor_status", "approve_spend"],
  "input_schema": {
    "type": "object",
    "properties": {
      "vendor_id": { "type": "string" },
      "amount_usd": { "type": "number" }
    }
  },
  "auth": { "schemes": ["oauth2", "api_key"] },
  "endpoint": "https://procurement-agent.acme.com/a2a"
}
Enter fullscreen mode Exit fullscreen mode

This is discovery. This is what lets a general-purpose orchestrator find and call a specialist agent without prior integration work. It's DNS + OpenAPI, applied to agents.

The Production Signal That Matters

The number that should get your attention: 150 organizations in production, not pilot .

Google announced A2A at Google I/O 2025 with 50 partners on paper. Twelve months later:

  • 150 organizations running real workloads between agents built on different vendors' stacks
  • Microsoft, AWS, Salesforce, SAP, and ServiceNow are running A2A in production environments
  • The Linux Foundation now governs it — removing the "what if Google gets bored?" question

Native A2A support now ships in ADK v1.0, LangGraph, CrewAI, LlamaIndex Agents, Semantic Kernel, and AutoGen . That's not a Google-curated list of close partners. That's where developers are actually building agent systems.

A2A vs. MCP: The Two Layers

A2A is designed to complement rather than compete with Anthropic's Model Context Protocol (MCP) :

Protocol Layer What it does
MCP Tool/Data Access How an agent connects to tools and data sources
A2A Agent Orchestration How agents communicate with each other across platforms

You need both. MCP connects agents to your databases and APIs. A2A lets agents talk to each other. Google now supports both natively.

Code Example: Building an A2A Agent

Here's a minimal A2A agent server using the new ADK v1.0 (Python, stable release) :

from google.adk.agents import LlmAgent
from google.adk.a2a import A2AServer, AgentCard, Capability

# Define what this agent can do
card = AgentCard(
    name="inventory-checker",
    version="1.0.0",
    capabilities=[
        Capability(
            name="check_stock",
            description="Returns current inventory level for a given SKU",
            input_schema={"sku": "string"},
            output_schema={"quantity": "integer", "warehouse": "string"}
        )
    ]
)

# Create the agent
agent = LlmAgent(
    model="gemini-3-flash",
    system_prompt="You check inventory. Be precise and fast.",
    tools=[check_inventory_db]
)

# Start the server
server = A2AServer(agent=agent, card=card, port=8080)
server.start()
# → GET  /.well-known/agent.json   (Agent Card discovery)
# → POST /a2a/tasks/send           (Task endpoint)
Enter fullscreen mode Exit fullscreen mode

And calling it from an orchestrator:

from google.adk.a2a import A2AClient

client = A2AClient()

# Auto-fetches the Agent Card
inventory_agent = await client.discover("https://inventory.acme.com")

# Send a task
task = await inventory_agent.send_task({
    "capability": "check_stock",
    "input": {"sku": "WIDGET-42"}
})

# Stream progress in real time
async for update in task.stream():
    print(f"Status: {update.status} | {update.message}")

result = await task.result()
print(f"Stock: {result['quantity']} units at {result['warehouse']}")
Enter fullscreen mode Exit fullscreen mode

This runs cross-platform. The inventory agent could be on Agent Engine. The orchestrator could be LangGraph, CrewAI, or AutoGen. A2A bridges them without custom serialization or SDK lock-in.


ADK v1.0: What "Stable" Actually Buys You

The Agent Development Kit (ADK) hit stable v1.0 releases today across Python, Go, and Java, with TypeScript also available .

The 0.x releases were experimentally useful — people shipped real things with them. But "production-ready" means something specific when your agents take autonomous actions: stable APIs you can actually depend on, predictable versioning, and a security model you can explain to your CISO.

Model Armor: Security at the Protocol Level

The standout security feature is Model Armor . It defends against:

Attack Type Description
Prompt Injection Malicious commands hidden in user input
Jailbreak Attempts Instructions to bypass safety restrictions
Session Poisoning Injecting harmful content into conversation history
Tool Output Poisoning External tools return malicious instructions
Sensitive Data Leakage Unintended exposure of PII or secrets

Model Armor works by applying pre-trained classifiers to every agent interaction :

User Input → Model Armor → Clean Input → Agent → Model Armor → Safe Output
              ↓                              ↓
         Block/Flag                    Block/Flag
Enter fullscreen mode Exit fullscreen mode

Performance characteristics vs. LLM-as-a-Judge :

Feature Model Armor LLM-as-a-Judge
Latency 100-300ms 500-1000ms
Cost Lower (optimized classifiers) Higher (LLM inference)
Setup Requires Cloud config Easy (SDK only)
Context Awareness Good Excellent

Best practice: Use both — Model Armor for fast baseline filtering, LLM-as-a-Judge for context-aware validation on critical operations.


MCP Servers: The Announcement Nobody's Talking About

While everyone focused on Gemini and A2A, Google quietly launched managed MCP servers running natively inside Google Cloud .

Before this: every time you wanted an AI agent to talk to an external service — a database, security dashboard, calendar — you had to build the bridge yourself. Custom API calls. Auth tokens stored somewhere sketchy. Error handling that breaks at 2 AM.

MCP is like USB-C for AI agents. It's the standardized port that lets AI agents plug into data sources without custom wiring every time.

Google's managed MCP servers now cover :

Service What it enables
Google Security Operations Agents query threat data without custom auth
Google Workspace Agents read docs, calendar, email securely
BigQuery Agents run analytics queries as natural conversation
Cloud Storage Read/write/analyze data using MCP
Maps, Compute Engine, Kubernetes Engine Fully managed remote MCP servers

The old way — days of work:

# 1. Build OAuth flow for Google Workspace
# 2. Set up token refresh logic
# 3. Write endpoint wrappers for Docs, Calendar, Gmail
# 4. Handle errors, retries, rate limits
# 5. Deploy and monitor forever
# -- 3 days minimum. Ongoing maintenance. --
Enter fullscreen mode Exit fullscreen mode

The new way — one line:

agent.connect(mcp_server="google-workspace")
agent.ask("Summarize all unread emails from the last 48 hours and add any deadlines to my calendar")
# Done. In production. Today.
Enter fullscreen mode Exit fullscreen mode

That is the removal of an entire category of work.


Governance: Active Directory for the AI Era

The Gemini Enterprise Agent Platform is organized around four pillars: Build, Scale, Govern, Optimize . The most important for enterprises is Govern.

Agent Identity

Every agent gets a unique cryptographic ID with an auditable trail mapped to authorization policies. If an agent takes an action, you know which agent, under which policy, at what time .

Agent Registry

A central catalog of every agent and approved tool across your organization — the equivalent of a container registry, but for agents. Whether the agent was built internally on ADK or sourced from the partner marketplace (Atlassian, Box, Salesforce, ServiceNow, Workday all launched agents at Next), it has one identity and one index.

Agent Gateway

Described by Kurian as "air traffic control for your agent ecosystem" :

  • Routes all agent traffic
  • Speaks both MCP and A2A natively
  • Applies Model Armor inline — prompt injection scanning happens at the network layer
  • Surfaces Agent Anomaly Detection — monitoring for tool misuse, unauthorized data access, and reasoning drift in production

Memory Bank

Persistent state for up to seven days, allowing agents to maintain high-accuracy context across sessions mapped to internal CRM and database records via Custom Session IDs . Stateful agents are no longer an edge case — they're the runtime's default assumption.


The Agentic Data Cloud: Grounding Agents in Reality

Agents are only as good as the data they can access. Google announced new capabilities for the Agentic Data Cloud :

Knowledge Catalog (formerly Dataplex)

A unified map of your data landscape across AlloyDB, BigQuery, Bigtable, Cloud SQL, and Spanner. Provides a single, governed source of truth needed to build and scale reliable agents.

Reverse ETL for BigQuery (Preview)

One-click solution to push analytical insights from BigQuery back into AlloyDB, Bigtable, or Spanner, enabling agents to serve them with sub-millisecond latency .

Spanner Columnar Engine (GA)

Analytical queries run up to 200x faster with zero impact on production transactional workloads.

For CrowdCommand, this means my safety agents can query live stadium sensor data while also accessing historical crowd flow patterns — all at conversational speed.


Where I'm Skeptical

The word "open" appears a lot. A2A is an open protocol. ADK is open source. The Model Garden includes 200+ models from multiple vendors, including Anthropic Claude .

All true.

And also: the smoothest path through every one of these tools runs directly through Google Cloud — Agent Engine for managed hosting, Apigee as the API-to-agent gateway, Vertex AI as the deployment target.

The protocol is portable. The operational infrastructure is not.

This isn't necessarily a problem — Google's runtime is genuinely good. But developers should be clear with themselves about what "open" covers here. The code you write on ADK travels with you. The observability tooling, managed hosting, and audit trail — those are Google Cloud products. That's a real dependency. Know what you're choosing.

Also worth watching:

  • MCP governance — Who controls access? Where are the logs? For regulated industries (healthcare, finance, legal), these aren't minor concerns .
  • Pricing — "Managed" usually means "metered." Unknown yet if this becomes expensive at scale.

The Developer Keynote: Code, Not Slides (April 23)

Today's Developer Keynote (10:30 AM PT) took a different approach. No polished slides. No rehearsed demos. Live coding. Real terminals. Real bugs.

Who Spoke

Stephanie Wong hosted, joined by:

  • Michele Catasta (President & Head of AI at Replit) — live-building agentic workflows
  • Harrison Chase (LangChain) — discussing multi-agent orchestration
  • Ankur Kotwal & Salman Ladha (Wiz) — security deep dive on agent isolation
  • Kevin Moore & Ines Envid — ADK v1.0 live demo
  • Sarah Kennedy & Ricky Robinett — "hot off the press" breakdown

The most valuable moment: watching them hit a production bug live, debug it, fix it, and redeploy. That's transparency documentation can't give you.

What They Covered

Topic Key takeaway
ADK v1.0 live Building agents with Python, streaming responses
MCP integration Connecting agents to BigQuery in 3 lines
Agent Gateway preview Real-time traffic management
Security Model Armor + Wiz integration
LangGraph + A2A Cross-framework agent communication

New Codelabs Released

55+ new codelabs. Start here: Codelab 9 — Developer Keynote: Building Agents with Skills

  • Build Rich Agent Experiences (ADK + A2UI)
  • Building a Multi-Agent System
  • Building Secure Agents (Model Armor + IAM)
  • Deploy and Scale Agents on Agent Engine

Google Workspace Studio: No-Code Agents

Also announced in Day 2: Google Workspace Studio lets business users build agents without code [citation:3].

Type: "Every Friday, ping me to update my tracker" → Gemini creates the automation.

Connects to Asana, Jira, Mailchimp, Salesforce via webhooks or Apps Script. Rolling out to Workspace business and enterprise customers.


Project Mariner: Web-Browsing Agents

Project Mariner scores 83.5% on WebVoyager — better than most human benchmarks [citation:3].

Handles 10 concurrent tasks on cloud VMs: shopping, research, form-filling. Available now to Google AI Ultra subscribers in the US.

Roadmap:

  • Q2 2026: Mariner Studio (visual builder)
  • Q3 2026: Cross-device sync
  • Q4 2026: Agent marketplace

The "Open" Question (Addressed Day 2)

During the keynote panel, Harrison Chase asked directly: "How open is A2A really?"

Google's response: The protocol is governed by the Linux Foundation. Microsoft, AWS, and Salesforce are all running it in production. The spec is public. The code is portable.

But: The smoothest path — Agent Engine, Apigee, Vertex AI — runs through Google Cloud. That's not lock-in. That's differentiation. Know the difference.

What to Actually Do With This

If you're building agents right now:

  • Read the A2A spec before the SDK docs. Understanding Agent Cards — what goes into them, how signing works, what a well-defined skill description looks like — shapes how you design agents from the start .
  • Try MCP first — Pick one workflow that involves pulling data from somewhere and summarizing it. That's your first MCP experiment .

If you're choosing a multi-agent framework:

  • A2A v1.0 in production at 150 organizations, across every major framework, is a meaningful signal about where multi-agent interoperability is actually converging.

If you're speccing an enterprise AI project:

  • Look at Memory Bank and Agent Identity before you finalize the architecture. Persistent agent state and proper credential management are the two things that most demo architectures quietly skip.

The Part That's Easy to Miss

The keynote demo that got the biggest reaction showed a Gemini agent pulling data from thousands of PDFs, catching a buried allergen hidden in one of them, then calling research agents to build a full market projection — autonomously, while the presenter talked.

That's a real capability and it's impressive. But it works because of things that weren't in the demo :

  • Agents that can find each other by capability (Agent Registry)
  • Agents that verify each other's identity (Agent Identity)
  • Agents that maintain context between calls (Memory Bank)
  • Agents operating inside an auditable security boundary (Agent Gateway)

That's the plumbing. And it's what makes the magic possible.


Conclusion: From Prompter to Orchestrator

Thomas Kurian's framing was bold: "You have moved beyond the pilot. The experimental phase is behind us" .

For developers, this means our job is shifting. We aren't just "prompting" anymore — we are orchestrating systems of intelligence.

The "Agentic Cloud" is finally providing:

  • A protocol standard (A2A) under neutral governance
  • Power-efficient hardware (TPU v8i) for inference at scale
  • Managed MCP servers that remove an entire category of integration work
  • A security story (Model Armor, Agent Identity) you can defend to your CISO

Infrastructure doesn't announce itself. It just works — until the day you need it and it's not there. Thankfully, the plumbing for the future of AI is finally being laid.


What are you building with the new Agentic Stack? Drop a comment below — I'd love to hear your take on A2A vs. MCP and what integrations you're most excited about.


Resources to Dive Deeper


Posted as part of the Google Cloud NEXT '26 Writing Challenge on DEV. The developer keynote is available on the DEV homepage — worth catching for how the ADK and A2A story gets told to a technical audience.

Top comments (0)