DEV Community

Cover image for Anthropic and Google Just Shipped the Same Product. 2 Weeks Apart. Different Logos.
Phil Rentier Digital
Phil Rentier Digital

Posted on • Originally published at rentierdigital.xyz

Anthropic and Google Just Shipped the Same Product. 2 Weeks Apart. Different Logos.

In April 2026, Anthropic launched Claude Managed Agents in public beta. 3 weeks later at Google I/O, Google shipped Managed Agents inside the Gemini API, wrapped in a developer suite called Project Antigravity. If you've been following where vibe-coding is headed (beyond the demo, into production), this is the signal you were waiting for.

Both keynotes used different words. Both pricing pages look unrelated. The press covered them as competing visions.

They're not competing visions. Same architecture, 2 logos on the container.

I spent a few days reading both sets of docs side by side. The level of convergence is too flagrant to treat as coincidence or a curiosity. It's a direct signal about where real value capture in AI infrastructure is actually landing in 2026.

What struck me reading both doc sets: both teams know exactly what they're doing. They didn't converge by accident. They solved the same problem the same way because there's really only 1 way that works. And that way is not the model. It's the runtime.

TLDR: Anthropic and Google shipped managed agent platforms 2 weeks apart. Strip the branding and you get 1 identical architecture: same sandbox, same MCP wire, same 3-axis billing meter. I'm not sure which part should worry you more: how obviously the same it is, or what that sameness is actually designed to own.

The Diagram Behind Both Keynotes

Both products implement the same physical separation: the model "brain" runs on the vendor's cloud. The tool-execution "hands" run in an ephemeral Linux sandbox the vendor also owns. A persistent session connects the 2 for hours, days, or weeks.

Anthropic calls it "separation of brain and hands." Google doesn't bother naming it. It's just how the Managed Agents API works. Different wording. Identical diagram.

In both systems, you make a single API call to provision an agent. The vendor spins up an isolated Linux container in its own data center. The model decides what to do. When it wants to run shell commands, edit files, browse the web, or call an external service, those actions execute inside that container (not on your servers, not in a Docker image you maintain, not in a Lambda you provisioned). The vendor's container, on the vendor's network, writing to the vendor's audit logs.

You get back a stream of events: the model's reasoning, the tool calls, the outputs. You never touch the machine. You can't. That's the product.

For anyone who's gone through the vibe-coding-to-production journey, this should register as the exact infrastructure decision you were already making manually. Vibe Coding, For Real is where I got into exactly this tradeoff, the one where "who owns the runtime" stops being theoretical.

The 5 Pieces Nobody Renamed

TITLE "The Managed Agent Stack" + subtitle "Anthropic vs Google: 5 components, 1 architecture". Metaphor: two parallel assembly lines running left to right, both producing identical boxes at the end, labeled BRAIN and HANDS. Style: engineer blueprint on navy background, white ink, grid paper texture, technical font. Palette: navy #1B2A4A, white #FFFFFF, amber #F5A623, slate #4A5568, red-orange #E05252. Content: two rows (Anthropic top, Google bottom), each with 5 stations labeled SANDBOX, SESSIONS, MCP VAULT, ORCHESTRATION, EVAL/MEMORY. Corresponding stations connected by vertical dotted lines showing "identical". Highlight: MCP VAULT station glows amber on both rows, with a padlock icon. Legend: sticky note bottom-left "dotted line = functionally identical / solid line = vendor-specific naming only". Footer: copyright rentierdigital.xyz. NOT flat corporate SaaS vector, NOT minimalist startup aesthetic.


Anthropic vs Google: Identical Agent Architecture Components

Once you accept the brain-and-hands frame, the rest of the resemblance falls out of it. Both platforms ship 5 components, often with names that are barely disguised translations of each other.

Ephemeral sandbox. Anthropic provisions a fresh Linux container per session, mounts files into /workspace, and exposes bash, file operations, and web browsing as native tools. Google does the same: a Linux sandbox, a code_execution tool, a google_search tool, a url_context tool. Both default to deny-all networking. Both let you mount files from cloud storage or clone a Git repo at startup.

Stateful sessions with checkpointing. Anthropic preserves the container's file system, installed packages, and conversation history across disconnections, retaining checkpoints for 30 days of inactivity. Google does the same via the Interactions API: pass a previous_interaction_id and the server reconstructs the entire prior context. Both call this "stateful" execution. Both abandoned their original stateless APIs as the agentic primitive.

The wire protocol: MCP. Both products converged on the Model Context Protocol (the standard Anthropic open-sourced in late 2024) as the way for agents to talk to external systems. Both expose a Vault for credentials, so API keys never appear in your agent prompts or your code. Both inject credentials at the network egress boundary. Google added "MCP Tunnels" for outbound-only private network access in May 2026. Anthropic shipped the same capability under the same name the same month. If you've been thinking about why CLIs still outrun managed MCP for certain workloads, this convergence directly changes that calculation.

The orchestration layer. Multi-agent orchestration moved from "third-party framework you bolt on" to "native API feature" in both ecosystems at almost exactly the same time. Anthropic's version: a lead_agent declares a roster of sub-agents and synthesizes their parallel outputs. Google's version: Antigravity's Shared Agent Harness instantiates parallel sub-agents from a single project. Both cannibalize LangGraph, CrewAI, and LlamaIndex's reason for existing.

Evaluator and memory layers. Anthropic ships "Outcomes": a second LLM that scores agent output against a rubric and loops until it passes. Google ships lifecycle hooks (post_tool_call, post_turn) that play essentially the same role, deterministic checkpoints inside a probabilistic loop. Anthropic ships "Dreaming" for long-term memory consolidation between sessions. Google ships automatic context compaction at roughly 135K tokens with retention of "critical state variables." Different mechanisms, same job: prevent the agent from forgetting itself and from confidently submitting garbage. (Opus performs noticeably better than Sonnet on the evaluator loops, by the way. Factor that into your model routing before committing to a tier.)

The 3-Axis Billing Trap

Most builders skip straight to the sandbox docs. The billing page is where the architecture reveals itself.

Before: both platforms used simple per-million-token pricing. You could estimate your monthly bill on a napkin. Your FinOps team was comfortable, relatively.

After: both companies abandoned token-only billing at exactly the same moment, and both replaced it with 3 simultaneous meters:

  • Tokens consumed (input and output, with aggressive cache discounts on repeated context)
  • Active runtime seconds: Anthropic bills around $0.08 per hour of container time while status = running. Google bills container hours essentially the same way via Vertex/AI Studio, with the meter pausing during idle waits for user input.
  • Specific tool calls: web search billed at fractions of a cent per query, on top of everything else.

Idle time is free in both systems. Waiting on a human confirmation is free. The clock only runs when the agent is actively reasoning or executing. That's a meaningful improvement over older container-hour models.

There's a perverse second-order effect both pricing pages reveal once you do the math: cheaper models can cost more. A Haiku or Flash agent that takes 5 iterations to solve what Opus or Pro solves in 1 ends up burning 5x the runtime seconds, 5x the tool calls, and a non-trivial fraction of the token savings. The optimization problem is no longer "pick the cheapest model." It's "find the model whose failure rate per task is low enough that the runtime meter doesn't eat your savings." Neither vendor has a dashboard that tells you which that is for your workload. You'll find out on the invoice. πŸ˜…

Where They Actually Diverge

3 real differences. These are the ones that should affect your choice.

Compliance posture. Anthropic's Managed Agents are explicitly not eligible for Zero Data Retention or HIPAA BAA coverage in their current form, because persistent checkpoints have to live somewhere, and that somewhere is Anthropic's storage. Google's offering routes through Vertex AI's existing compliance envelope, which is broader by virtue of GCP's enterprise track record. If you're in healthcare, defense, or regulated finance, this gap is the only thing that matters this quarter.

The self-hosted escape hatch. Anthropic recognized the compliance problem and shipped self-hosted sandboxes in May 2026: the brain stays in their cloud, the hands move to your infrastructure via partners like Cloudflare, Modal, Vercel, and Daytona. Google has not shipped an equivalent. If keeping execution inside your network perimeter is non-negotiable, Anthropic currently wins this.

The developer surface. Google bundled a desktop app (Antigravity 2.0), the agy CLI, and a Python SDK into a single ecosystem with bidirectional sync. Anthropic ships an API and lets the ecosystem build the surfaces. Whether that's a feature or a bug depends entirely on whether you want a polished vendor-owned IDE or a flexible API you wrap in your own tooling.

Everything else: sandbox model, billing axes, MCP, multi-agent, evals, memory, vault, checkpoints. Same product.

Why They Converged

This isn't coincidence. It's the natural shape of the problem, and understanding that shape makes you a less naive buyer.

If you're a foundation-model vendor in 2026, the model itself is the most easily commoditized layer of your stack. Customers route Haiku for cheap tasks, Sonnet for medium, Opus for hard, Gemini for multimodal, GPT for whatever's left. They benchmark quarterly and switch routes monthly.

The model, you rent. The runtime accrues lock-in with every tool call and checkpoint. So both vendors did the obvious thing and reached for the runtime at the same time, with the same architectural answer, because there's really only 1 architectural answer that works.

Actually, let me put it differently. The runtime dynamic is the thing builders are slowest to internalize, and it's worth sitting with for a second. Once your customer has wired their MCP servers into your Vault, stored their credentials in your key management system, written their agent prompts against your sandbox semantics, set up their audit pipeline against your event schema, and accumulated 30 days of checkpoint state in your storage layer, switching vendors stops looking like an afternoon of config changes and starts looking like a cloud migration: multi-month project, cross-functional sign-off, CFO involvement, the works.

The collateral damage is the third-party orchestration ecosystem. LangChain, LangGraph, LlamaIndex, CrewAI: those frameworks existed to glue a stateless model API with a stateful execution environment you provisioned yourself. Both vendors just absorbed both halves. The glue layer has nothing left to glue.

Working with both ecosystems the past 6 months, the pattern was already visible in how both companies were quietly expanding their APIs to absorb more of the stack. The managed runtime announcement was less a surprise than a formalization of something already underway. The 2-week gap between launches was a coincidence of release calendars. The underlying decision was made months earlier by both teams, probably independently, after staring at the same whiteboard problem.

(Small digression that has nothing to do with agents: a bakery near my apartment changed owners twice in 3 years. Each new owner, independently, brought in the same espresso machine and the same croissant recipe. They had never spoken. Sometimes there is only 1 right answer, and convergence is just what it looks like when 2 separate teams find it.)

4 Things to Do Before You Commit

Treat the agent runtime as the migration risk it is. Architect your prompts, tools, and MCP servers to be portable. The minute you start using vendor-specific event types or orchestration primitives without analogs on the other side, you've doubled your exit cost.

Don't put your only copy of important state inside a vendor checkpoint. Persist what you actually care about (outputs, audit trails, intermediate artifacts) to storage you control. Treat the vendor's session state as cache, not source of truth.

Measure runtime cost per task, not per token. The 3-axis billing makes token cost the least informative number on your invoice. Tag every agent run with a task type and a model choice, then watch the runtime-seconds and tool-call columns more than the token column.

Pick the side with the compliance posture you need now, not the one with the better demo. The demos are interchangeable. The compliance docs are not.

If you're still figuring out how to structure your agent prompts before committing to a managed runtime, the prompt contracts framework I built is worth reading first. Going in with a structured approach to prompt architecture makes the evaluation cleaner: you'll know exactly what you're handing over.


Managed Agents, both versions, is a genuinely good product. The productivity gain is real and large. Before: provisioning a secure sandbox, wiring credential rotation, building a debuggable event pipeline. Weeks of plumbing. Now it's 1 API call. Hard to argue with that.

The price: your runtime is no longer yours. The sandbox is theirs, and so is everything downstream of that first API call (credentials, logs, your agent state accumulating in their checkpoints). A few years from now you'll look back and think this felt exactly like Lambda in 2019: you made the same calculation, and it was probably the right one.

So pick your vendor. But pick with lock-in already priced in. The model, you can swap any Monday morning. The runtime is a cloud migration, and you don't do that between 2 glasses of Saint-Γ‰milion.

Sources

  • Anthropic Managed Agents public beta documentation, April 2026
  • Google I/O 2026: Project Antigravity and Gemini Managed Agents announcement
  • Anthropic self-hosted sandboxes announcement, May 2026 (Cloudflare, Modal, Vercel, Daytona partnerships)

This post may contain affiliate links. If you click them, I might earn a small commission β€” costs you nothing, and helps me keep shipping quality articles every day for your reading pleasure.

Top comments (0)