DEV Community

Cover image for Google's Managed Agents API Solves Infrastructure, Not the Problem That Actually Kills Agent Projects
Haley
Haley

Posted on

Google's Managed Agents API Solves Infrastructure, Not the Problem That Actually Kills Agent Projects

Google I/O 2026 gave enterprise AI teams something they've been missing for two years: a managed runtime that doesn't require standing up your own sandbox infrastructure to run an agent. Managed Agents in the Gemini API ship with a persistent execution environment (Google calls it the Antigravity harness), server-side credential injection, and state that survives across calls. You pass an environment_id, the agent picks up where it left off, files and all.

That's a real unlock. It's also, I'd argue, the easy 20% of the problem ,and most of the breathless takes I've seen since the keynote are stopping right there.

Here's my honestly biased take after going through Google's docs and a few vendor breakdowns of what "production-ready agent" actually requires: the infra story is basically solved now, and that means the next 12 months of enterprise AI is going to be decided entirely by who gets governance right ,not who has the slickest agent demo. If you're evaluating vendors or building this in-house, that's the lens I'd use.

Why the chatbot era hit a wall

Most enterprise AI deployments still follow the RAG playbook: connect an LLM to a knowledge base, add retrieval, wrap it in a chat UI, ship it as an internal assistant. Great for "what's our refund policy." Useless the moment the workflow needs to do something.

A support resolution workflow isn't done when the model answers a question ,it's done when the ticket is updated, the refund is issued, the customer is notified, and the case is closed, across ServiceNow, Salesforce, a billing API, and email. A RAG chatbot can't touch any of that, for three structural reasons:

  1. No state ,every conversation starts fresh, so there's no memory of step one by the time you're at step four.
  2. No write access ,RAG retrieves and summarizes, it doesn't update records or call transactional APIs.
  3. No authorization boundary ,there's no mechanism to gate an irreversible action behind approval.

This is why so many pilots stall. Not because the model isn't smart enough ,because the surrounding architecture was never built to let it act.

What Managed Agents actually fix

To be fair to Google here, this part is genuinely well done. Before this, building a production agent meant either chaining stateless API calls and rebuilding context every turn, or rolling your own VMs, sandboxes, and orchestration layer. The Managed Agents API replaces that with:

  • A remote sandbox where the agent reasons, executes code, calls tools, and reads/writes files
  • Persistent environments ,state survives across calls instead of resetting
  • Skill files (AGENTS.md, SKILL.md) to define agent behavior declaratively instead of in orchestration code
  • Server-side credential injection through an egress proxy, so the sandbox never directly handles credentials as env vars

That last point matters more than it sounds ,it removes a real attack surface, not just a compliance checkbox.

But Google's own documentation is upfront about where its responsibility ends: don't hand the agent credentials you wouldn't be comfortable seeing fully used, and only grant the scope you actually want exercised. Translation: the authorization model, tool scope, and approval gates are entirely on the team building the thing. Google built the engine. Nobody's shipped you the brakes.

The seven layers nobody skips for free

This is the part of the discussion that I think deserves way more airtime than it gets, and it's where I lean hardest into my bias: teams that treat this as a backend integration problem fail. Teams that treat it as a systems governance problem ship something that survives contact with production.

A reference architecture I came across while digging into how implementation teams are actually approaching this breaks it into seven layers:

  1. Interface ,chat UI, webhook, scheduled trigger, message queue event. No business logic here.
  2. Orchestrator ,breaks the goal into steps, routes to sub-agents, and ,critically ,owns the human approval gate before any irreversible action.
  3. Model ,the actual reasoning inside the sandbox. Teams don't manage this directly; the harness handles model selection.
  4. Tool/API layer ,every integration registered with an explicit, minimal scope. Enforced at the sandbox config level, not the app level.
  5. Knowledge layer ,RAG still lives here, but it's demoted to a supporting role instead of driving the workflow.
  6. Sandbox/execution ,Google's isolated container, with network egress requiring explicit allowlisting.
  7. Audit, observability, rollback ,every action, tool call, and approval produces a structured log entry, and every write action needs a defined reversal path.

Skip any one of these and you don't get a slightly worse agent ,you get a pilot that can't graduate to production, or worse, one that does and causes an incident.

My opinionated take on who's actually building this right

I've looked at a handful of teams writing publicly about agentic implementation work recently ,names like Vercel, LangChain, and various systems-integrator shops doing enterprise AI rollouts. Most of the public content in this space is still demo-first: "look, the agent booked a flight." Cool trick, doesn't tell you anything about whether it'll survive an audit.

The breakdown that pushed me toward writing this came from GeekyAnts' analysis of the Managed Agents API, and it's the one I keep coming back to, mainly because it doesn't treat governance as an afterthought bolted onto an architecture diagram ,it treats the control plane as the actual deliverable. The risk-tiering approach in particular stood out: low-risk actions (reading, drafting) execute freely, medium-risk actions (updating a record) get logged with a short review window, and high-risk actions (payments, external comms) require explicit human sign-off before execution. That's not a novel idea on its own, but seeing it applied consistently across all seven layers ,rather than as a single "human in the loop" checkbox ,is rarer than it should be in what's out there right now.

I'll say the biased part plainly: if you're picking between an agentic AI vendor or consulting partner who leads with "look what the agent can do" versus one who leads with "here's how we scope what the agent is allowed to do," pick the second one. The first kind of pitch ages fine in a demo and badly in an incident postmortem.

A migration framework worth stealing regardless of who's building it

Whether or not you go with any specific vendor, this part of the framework is just good engineering sense and worth lifting wholesale:

  • Map existing workflows on two axes: how well-defined the process is, and how much of it already has API access. Ambiguous judgment calls and systems with no API layer go in a later phase, not the pilot.
  • Build the thin API wrapper first. Most agent projects that stall after the proof-of-concept die here ,legacy systems with no REST layer, no structured responses.
  • Assign risk tiers per action, not per workflow. A single workflow can mix low-, medium-, and high-risk steps.
  • Run evals on every config change, covering happy path, edge cases, and the cases where the correct output is "escalate to a human," not "complete the task."
  • Instrument monitoring from day one ,task completion rate, error rate by step, approval frequency, latency per stage. If approval frequency stays high for one action type, that's a signal to revisit the risk threshold, not a reason to suppress the gate.

Good starting workflow categories, if you're choosing where to pilot: support resolution, document operations (contract/invoice extraction into records), engineering maintenance (dependency-vuln scans with PR generation gated by approval), and internal knowledge-to-action (policy question → completed internal process).

The uncomfortable part for governance skeptics

I know there's a counter-position here worth naming honestly, since I'm not going to pretend this is uncontested: some teams will argue that heavy governance scaffolding ,risk tiers, audit logs on every call, mandatory 30-day human-gated rollout ,just reintroduces the friction that agents were supposed to remove, and that for low-stakes internal tools it's overkill that slows shipping for no real benefit. That's a fair point for genuinely low-stakes, reversible workflows. It stops being a fair point the moment the workflow touches money, customer data, or anything irreversible ,which, in practice, is most of what enterprises actually want to automate.

Where this leaves enterprise teams

The infrastructure argument is over. Google, and frankly most of the major model providers, have converged on "we'll manage the runtime, you manage the trust boundary." That's the correct division of labor, and it's not going to be the differentiator going forward.

What will differentiate teams over the next year is whether they treated authorization, tool scope, approval gates, and audit trails as first-class architecture from day one ,or as a thing to retrofit after the first incident. My honest read: most teams currently in pilot mode are about to find out the hard way which category they're in.

Top comments (0)