Gerardo Arroyo for AWS Community Builders

Posted on Jul 3 • Originally published at gerardo.dev

Lambda MicroVMs vs AgentCore Runtime: When to Use Each for Production Agents

#aws #lambda #agentcore #runtime

Lambda MicroVMs vs AgentCore Runtime: When to Use Each for Production Agents

On June 22, 2026, AWS launched something that is not a feature or an update: it's a new compute primitive. Lambda MicroVMs lets you give each user or session its own isolated Firecracker VM, with state that persists up to 8 hours, snapshot boot in seconds, and no infrastructure to manage.

If you've been working with AgentCore Runtime like I have, the instinctive reaction when reading the announcement is the same one the technical press had: another obvious use case for AI agents — but AWS already offers AgentCore Runtime, which looks a lot like MicroVMs.

And identifying the overlap is correct. But it's a mistake to take that all the way to concluding they're interchangeable.

I spent the last few days reviewing the official documentation, Aidan Steele's hands-on notes (who explored the service on launch day), and Yan Cui's analysis, plus mapping everything against what I already know about AgentCore Runtime from covering it in depth in this series. This article is the result of that work.

🎯 ProTip #1: The right question isn't "Lambda MicroVMs or AgentCore Runtime?". The right question is "who runs the code — my agent, or my users?" That distinction determines everything else.

What They Have in Common: The Same Engine

Before talking about differences, it's worth understanding why the confusion is reasonable.

Both services run on Firecracker, the open-source VMM AWS developed internally that already powers more than 15 trillion Lambda Function invocations monthly. Both provide real VM-level isolation: no shared kernel, no shared resources between sessions. Both support up to 8 hours maximum duration. And both added interactive shell support in June 2026: Lambda MicroVMs with the CreateMicrovmShellAuthToken API, and AgentCore Runtime with InvokeAgentRuntimeCommandShell.

That's enough similarity for the confusion to be understandable. But the analogy that best captures the difference was written by Yan Cui:

"AgentCore Runtime is to Lambda MicroVMs what Fargate is to EC2."

Fargate runs your container in a microVM but controls what you can do inside. EC2 gives you the full machine. AgentCore gives you managed container hosting in microVM. Lambda MicroVMs gives you the VM itself.

The isolation boundary is the same — Firecracker. What you can do inside is not.

AgentCore Runtime: The Managed Platform

AgentCore Runtime is what AWS calls a "managed agent platform." You deploy your agent code — LangGraph, Strands, CrewAI, or whatever framework you use — and AWS manages everything else. Fleet management. Routing sessions to VMs. Scaling and teardown of idle sessions. Inbound and outbound auth. Native MCP and A2A protocol support. Versioning. Endpoints.

When your user talks to your agent, you don't think about VMs. The runtime provisions the microVM, routes the session, and when the session ends it destroys the environment and clears memory.

What runs inside the AgentCore Runtime container is your agent logic: the reasoning loop, the tools, the conversational state. The LLM model lives in Bedrock, on the Anthropic side, or wherever you've configured it. AgentCore puts your agent in a secure, scalable environment with built-in observability — CloudTrail for auditing, CloudWatch for logs.

The constraints are those of any managed platform: you run inside a container that AWS configures and operates, you don't have the full VM at your disposal. The environment is deliberately more restricted than a VM with full OS capabilities, because running an agent doesn't require that level of control. AgentCore does give you interactive shell — but through a managed API (InvokeAgentRuntimeCommandShell), not raw access to the pseudoterminal device. If your use case needs things like direct /dev/ptmx access, running your own Docker daemon, or low-level network stack manipulation, that's not AgentCore Runtime's territory — it's exactly what Lambda MicroVMs enables, as we'll see.

🔍 ProTip #2: AgentCore is available in 15 regions according to the official AWS FAQ — including us-east-1, us-west-2, eu-central-1 (Frankfurt), ap-northeast-1 (Tokyo), Mumbai, Sydney, Singapore, and several in Europe. Lambda MicroVMs at launch is only in 5 regions: us-east-1, us-east-2, us-west-2, eu-west-1 (Ireland), and ap-northeast-1 (Tokyo). Keep in mind that certain specific AgentCore Runtime features (interactive shell, stateful MCP, AG-UI) reach a subset of those regions, so check the official supported regions list for your specific feature. If your workload needs a region outside MicroVMs' 5, AgentCore Runtime is still the only managed option for now.

Lambda MicroVMs: The Low-Level Primitive

Lambda MicroVMs is a completely new API namespace within Lambda — aws lambda-microvms — with its own command surface. It's not a configuration option for Lambda Functions. It's a distinct resource.

The flow is different from the start. Instead of a handler that responds to events, you package your application (anything that runs on Linux) and a Dockerfile in a zip, upload it to S3, and call create-microvm-image. Lambda runs the Dockerfile, initializes the application, and captures a Firecracker snapshot of the in-memory and disk state. That's your image.

When you need an isolated environment — for a user, a job, a sandbox session — you call run-microvm. Lambda launches the VM from the snapshot with near-instant startup and returns a dedicated HTTPS endpoint with a unique ID for that VM. Clients connect over HTTP/2, gRPC, or WebSockets. Each connection requires a bearer token you generate with CreateMicrovmAuthToken.

What you can do inside that environment is completely different from AgentCore Runtime:

# Inside a MicroVM you have a full Linux system
# Docker inside the MicroVM: works
# /dev/ptmx for pseudoterminals: available
# iptables, eBPF, network namespaces: available
# Up to 16 vCPUs and 32 GB of memory
# 32 GB of disk

The most important gotcha that Aidan Steele documented on launch day: all outbound UDP is blocked by default. The default DNS resolver is a local stub. If you run Docker containers inside your MicroVM, DNS inside the containers will try to fall back to 8.8.8.8 — which fails because UDP is blocked. The fix is to force Lambda's DNS: docker run --dns 169.254.169.253.

⚠️ ProTip #3: MicroVM images are built exclusively from a Dockerfile in a ZIP uploaded to S3 — not directly from an ECR image. The official documentation suggests ECR is supported, but Aidan Steele verified on launch day: it isn't. Your Dockerfile can do FROM some-ecr-image, but the process always starts from a Dockerfile in S3. Another detail: Lambda builds two copies of the image — one for Graviton 3 and one for Graviton 4 — which is why they ask for the source Dockerfile instead of the compiled image.

Also, horizontal scaling is yours. Lambda MicroVMs auto-scales vertically within each VM (up to 4x the configured baseline), but if you need more isolated environments, your application creates them, tracks them, routes users to the right environment, and cleans them up. There's no automatic fleet management. No built-in routing.

That's not a limitation — it's a deliberate design decision. Lambda MicroVMs is a primitive. The fleet management plumbing is what you build on top.

The Right Mental Model

The clearest way I found to think about this:

AgentCore Runtime: "run my agent so my users can talk to it."

Lambda MicroVMs: "give each user their own isolated VM to run code in."

In AgentCore Runtime, your agent executes code. The code it executes is the LLM's reasoning, the tools you defined, the logic you programmed. Users interact with the agent — they don't execute arbitrary code in the environment.

In Lambda MicroVMs, your users execute code. Or more precisely, the code the agent generates — LLM output, user scripts, analysis queries — runs in the isolated environment. The agent is the origin of the code, not the executor.

This distinction defines the use cases for each service.

🎓 ProTip #4: There's a use case where Lambda MicroVMs and AgentCore Runtime directly complement each other: coding agents with execution sandboxes. AgentCore Runtime runs your agent (the reasoning). Each time the agent generates code to execute, it launches a Lambda MicroVM to run it in isolation. The result comes back to the agent. It's the architecture AWS documents in the official guide for Claude Managed Agents with self-hosted sandboxes.

When to Use AgentCore Runtime

AgentCore Runtime is the right primitive when:

You're building an agent that provides a service to users. A technical support assistant. A financial analysis agent. A coding bot that helps debug the user's code, but where the agent has its own state, tools, and reasoning. The user talks to the agent, they don't execute code in the agent's environment.

You need the AgentCore managed ecosystem. If you use AgentCore Memory for episodic persistence, AgentCore Policy for deterministic guardrails, AgentCore Payments for micropayments, or the Agent Registry for governance — all of that is built on top of the Runtime and integrates without additional code.

You need availability in more regions. Fifteen regions including Mumbai, Sydney, Singapore, and several in Europe that Lambda MicroVMs doesn't cover yet.

The operational complexity of fleet management is not part of your product. You don't want to write the code that decides how many VMs to keep running, which ones to assign to which user, and when to clean them up. AgentCore handles that for you.

Your agent spends a lot of time waiting on I/O. AgentCore Runtime only charges CPU for active consumption — if your agent isn't consuming CPU while it waits for an LLM response, a tool call, or a database query, that time isn't billed (official AgentCore Runtime pricing). For agentic workloads, which typically spend 30-70% of their time in I/O wait, this is a real cost advantage over a model that bills the entire session duration regardless of whether CPU is active or idle.

💡 ProTip #5: If you already have AgentCore Runtime in production and your agent executes code that users send, that's the only case where you should consider migrating the execution of that code to Lambda MicroVMs as a sandbox. Not migrating the agent — migrating the layer where untrusted code runs.

When to Use Lambda MicroVMs

Lambda MicroVMs is the right primitive when:

You're building a multi-tenant platform where each user needs their own execution environment. An interactive data analysis notebook. A coding assistant that executes the code the LLM generates. A CI/CD platform that runs pipelines in per-tenant isolation. A vulnerability scanner that inspects packages. In all these cases, the product is the isolated execution environment — not the agent that reasons.

You need full OS access. Running Docker inside the VM. eBPF. Mounting filesystems. Installing OS-level packages. None of this is possible in AgentCore Runtime because it runs in a container, not a full VM.

You're building something that isn't an AI agent. A game server that runs user scripts. A CI environment with strong per-tenant isolation. A security research tool. Lambda MicroVMs isn't tied to the agent ecosystem.

You want granular lifecycle control. Lambda MicroVMs exposes direct APIs for suspend-microvm, resume-microvm, and terminate-microvm. You can build exactly the lifecycle logic your product needs.

🚨 ProTip #6: Lambda MicroVMs' pricing model is per instance-second — closer to Fargate than Lambda Functions. You're billed per vCPU-second + GB of memory per second + snapshot reads/writes + data transfer. Unlike Lambda Functions (per millisecond), Lambda MicroVMs bills per second. While the VM is suspended, compute charges cease — you only pay snapshot storage. Suspend/resume is the primary lever for controlling costs in sessions with long idle periods. Verify current prices on the official AWS Lambda pricing page before budgeting any workload.

The Honest Decision Table

To bring all of this down to something actionable:

	AgentCore Runtime	Lambda MicroVMs
Who executes code	Your agent (your logic)	Your users or the code the LLM generates
Abstraction level	Managed platform	Compute primitive
Fleet management	AWS manages it	Your application manages it
Built-in auth	Yes — inbound and outbound	Bearer token per VM, routing is yours
MCP/A2A protocols	Built-in	Doesn't exist, you build it
OS access inside	Container (restricted)	Full VM (Docker, ptmx, eBPF)
Suspend/resume	Not supported	Yes — explicit API
Regions (Jun 2026)	15 regions (specific features in subset)	5 regions
Architecture	ARM64	ARM64 only
Billing	Per active vCPU-second + memory-second (I/O wait isn't billed)	Per vCPU-second + memory-second
Interactive shell	`InvokeAgentRuntimeCommandShell` (PTY over WebSocket, up to 10 concurrent)	`CreateMicrovmShellAuthToken` (direct PTY)
AgentCore ecosystem	Memory, Policy, Payments, Registry	No direct integration
Max duration	8 hours	8 hours

The Coding Agent Case: Both Together

There's a case where both services are the answer — not one or the other. It's the one AWS formalized with the official guide for Claude Managed Agents with self-hosted sandboxes.

The architecture is:

AgentCore Runtime runs your agent — the reasoning loop, the model, conversational session management.
When the agent generates code to execute (bash, Python, Node.js), your control plane launches a Lambda MicroVM per session.
The agent's code executes in that isolated sandbox.
The result comes back to the agent as tool output.

The concrete flow according to official documentation:

1. Anthropic sends webhook session.status_run_started
2. A Lambda Function verifies the signature and calls run-microvm
3. The MicroVM boots, executes tool calls in /workspace
4. Results return to Anthropic
5. The MicroVM suspends or terminates when the session closes

Your Anthropic API key never touches AWS compute in this flow. The launcher only passes a reference to Secrets Manager — the MicroVM reads the key at runtime with temporary credentials via IMDSv2.

🔧 ProTip #7: If you implement this pattern, configure your MicroVM's idle-policy with suspendedDurationSeconds: 0 and autoResumeEnabled: false for per-user sessions. You want MicroVMs to terminate when the session closes, not suspend waiting for traffic that may never come back. Use maximumDurationInSeconds as a safety ceiling for stuck sessions — the maximum is 28,800 seconds (8 hours).

What Isn't Known Yet

It would be dishonest not to include the current limitations worth monitoring:

Lambda MicroVMs has been available since June 22. The adoption curve is short and details are still emerging:

Maximum image size is undocumented. Aidan Steele verified at launch that the machine building images has approximately 7.2 GB of free disk space. Large images may fail at build time with a cryptic error.

Environment variables require an image rebuild. In Lambda Functions, updating env vars is immediate. In MicroVMs, env vars go in the image — changing them requires create-microvm-image again. Design access to dynamic configuration (Secrets Manager, SSM Parameter Store) as runtime calls, not env vars.

ECR as source isn't supported despite what the docs suggest. The source is always a Dockerfile in a ZIP in S3. Your Dockerfile can FROM an ECR image, but the process starts from the Dockerfile.

Limited regions at launch. The Lambda team has a track record of expanding regions quickly, but if us-east-1, us-east-2, us-west-2, eu-west-1, or ap-northeast-1 don't serve your case, wait for the expansion.

🎯 ProTip #8: Before designing a production architecture on Lambda MicroVMs, run the full official tutorial: create-microvm-image + run-microvm + auth token + real HTTP request. According to Aidan Steele's measurements on launch day, the run-microvm to RUNNING state cycle took ~2 seconds, and the app serving HTTP 200 OK ~2 additional seconds — but he himself warns to take those numbers with a grain of salt, as he was connecting to us-east-1 from the other side of the world. Suspend and resume each took around 1 second in his tests. Measure it yourself from your region before committing any number to your design: if your UX requires sub-second cold start, that delta matters.

My Evaluation

If you're building agents that serve users, AgentCore Runtime is still the right starting point. The Memory, Policy, Payments, and Session Storage ecosystem exists for that case, and managed fleet management eliminates an entire class of platform work.

If you're building platforms where users run code, Lambda MicroVMs is the cleanest option AWS has offered yet. It was a problem that previously required operating Firecracker directly, or paying a specialized vendor, or making isolation concessions with containers. Now it's a run-microvm.

If you're building coding agents — agents that reason about code and need to execute it safely — the right architecture probably uses both: AgentCore Runtime for reasoning, Lambda MicroVMs for the execution sandbox.

The question you need to answer before choosing isn't technical. It's a product design question: in your system, who executes the code you didn't write?

If the answer is the agent — AgentCore Runtime.

If the answer is your users — Lambda MicroVMs.

If the answer is both — both.

Are you evaluating Lambda MicroVMs for a specific use case? Or do you already have AgentCore Runtime in production and wondering if migrating makes sense? I'm interested in hearing what architectures you're considering — the comments are open.

See you in the next article! 🚀

DEV Community

Lambda MicroVMs vs AgentCore Runtime: When to Use Each for Production Agents

Lambda MicroVMs vs AgentCore Runtime: When to Use Each for Production Agents

What They Have in Common: The Same Engine

AgentCore Runtime: The Managed Platform

Lambda MicroVMs: The Low-Level Primitive

The Right Mental Model

When to Use AgentCore Runtime

When to Use Lambda MicroVMs

The Honest Decision Table

The Coding Agent Case: Both Together

What Isn't Known Yet

My Evaluation

Official Resources 📚

Top comments (0)