Sarvar Nadaf for AWS Community Builders

Posted on Jun 30

Introducing Lambda MicroVMs - Isolated, Stateful Sandboxes for Running Untrusted Code on AWS

#ai #programming #aws #productivity

👋 Hey there, Tech Enthusiasts!

I'm Sarvar, a Cloud Architect who loves turning complex tech problems into simple solutions. I've worked with AWS, Azure, DevOps, Data, Analytics, Generative-AI and Agentic-AI building real systems for real companies. In this article series, I'll share what I've learned in a way that's easy to follow, whether you're experienced or just getting started.

Let's get into it! 🚀

On June 22, 2026, AWS launched Lambda MicroVMs. It is a new compute primitive inside Lambda that gives you a dedicated Firecracker virtual machine per user or session. It is not an update to Lambda functions. It is a different thing with a different model, different pricing, and different use cases.

If you are building anything where users or AI agents execute arbitrary code and you need strong isolation, fast startup, and persistent state this is what you reach for now.

What Existed Before and Why It Was Not Enough

Before this launch, if you needed to sandbox untrusted code on AWS, you had three options. Each one forced a compromise.

EC2 instances give you full VM isolation and persistent state. But they are not fast enough for interactive use. Between AMI boot, instance initialization, and user data scripts, you are looking at 30 seconds to several minutes before a user can do anything. That kills the experience for coding assistants or on-demand sandboxes.

Containers (ECS/Fargate) start faster and keep state while running. But containers share a kernel with the host. That shared kernel is a security boundary you cannot fully trust when running code from strangers on the internet. You can layer security on top, but the fundamental model is weaker.

Lambda functions give you real VM-level isolation (they already run on Firecracker) and start in milliseconds. But they die after 15 minutes. They are stateless between invocations. They follow a request/response model. You cannot give a user a persistent environment where they write code, run it, see the output, install a package, and run again all within the same session.

Teams building coding assistants, AI agent sandboxes, or multi-tenant notebook platforms had to stitch together custom solutions. EC2 with a lifecycle manager. ECS with heavy security configuration. Running Firecracker directly on bare metal. All of it was operational overhead solving a problem that should have had a managed answer.

What Lambda MicroVMs Actually Is

Lambda MicroVMs is that managed answer.

You package your application code and a Dockerfile into a zip archive, upload it to S3, and call the Lambda API to create a MicroVM image. Lambda executes your Dockerfile, starts your application, and captures a snapshot of the fully initialized environment.

When you need a sandbox for a user, you call run-microvm. Lambda launches a MicroVM from that snapshot with rapid startup.

Each MicroVM gets:

Its own dedicated HTTPS endpoint (no load balancers or ingress infrastructure needed)
Full VM-level isolation (separate kernel, no shared resources with other tenants)
Persistent state for up to 8 hours (memory and disk survive suspend/resume)
Automatic suspend when idle (you stop paying for compute)
Automatic or programmatic resume when traffic arrives (picks up where it left off)
Vertical scaling up to 4x the configured baseline CPU and memory (e.g., 2 GB / 1 vCPU baseline scales to 8 GB / 4 vCPU)

Users connect over HTTP/2, gRPC, or WebSockets. Authentication is handled through bearer tokens you generate via the CreateMicrovmAuthToken API.

The Mental Model Shift

This is important: Lambda MicroVMs is not request/response.

With Lambda functions, a request comes in, your code runs, it returns a response, the invocation ends. Scaling is automatic. You think in terms of individual invocations.

With Lambda MicroVMs, you spin up a persistent VM for a user or session. It stays alive. It has a dedicated URL. Multiple requests hit the same VM. State accumulates. The user installs a package, it is still there on the next request.

You manage the fleet yourself. You decide when to create a MicroVM, which user it belongs to, and when to tear it down. There is no automatic horizontal scaling. Your application owns that logic.

This is closer to managing a pool of servers than it is to writing Lambda functions. The difference is you do not manage the infrastructure underneath no AMIs, no instance types, no patching, no capacity planning for the host.

What It Looks Like in Practice

Here is the basic workflow using the AWS CLI. Command names below follow the API naming convention. Check the CLI reference for your SDK version.

# Step 1: Upload your application package (Dockerfile + code) to S3
aws s3 cp my-sandbox-app.zip s3://my-bucket/my-sandbox-app.zip

# Step 2: Create a MicroVM image from your package
aws lambda create-microvm-image \
  --image-name my-sandbox \
  --s3-bucket my-bucket \
  --s3-key my-sandbox-app.zip

# Step 3: Launch a MicroVM for a user session
aws lambda run-microvm \
  --image-name my-sandbox \
  --memory-size 2048

# Step 4: Generate an auth token for the user to connect
aws lambda create-microvm-auth-token \
  --microvm-id mvm-abc123

# Step 5: When session ends, terminate
aws lambda terminate-microvm \
  --microvm-id mvm-abc123

The user connects to the dedicated HTTPS endpoint returned by run-microvm using their auth token. From there, they interact over HTTP/2, gRPC, or WebSockets depending on what your application exposes.

For shell access (giving users a terminal inside the VM):

aws lambda create-microvm-shell-auth-token \
  --microvm-id mvm-abc123

This returns credentials that connect directly to a pseudo-terminal inside the MicroVM. AI coding tools use this to provide real terminal experiences to end users.

Practical Details That Matter

Startup: MicroVMs launch from pre-initialized snapshots, similar to how Lambda SnapStart works. This skips application initialization entirely your environment is already warm when it starts.

Duration: Up to 8 hours per session. After that, the MicroVM is terminated.

Suspend/Resume: When no traffic hits a MicroVM, it suspends automatically based on an idle timeout you configure (how long it waits with no inbound traffic before suspending). You stop paying for compute. When a request arrives, it resumes with memory and disk state fully intact. You can also trigger suspend and resume programmatically via suspend-microvm and resume-microvm APIs. Resume latency is not a full cold start it restores from a memory snapshot, which is significantly faster than booting from scratch. AWS has not published exact resume latency numbers at launch, so expect to benchmark this for your specific workload.

Vertical scaling: You configure a baseline (default is 2 GB memory / 1 vCPU, allocated in a 2:1 memory-to-CPU ratio). During peak activity, the MicroVM can burst to 4x that baseline automatically. For example, a 2 GB / 1 vCPU baseline can burst to 8 GB / 4 vCPU. An 8 GB / 4 vCPU baseline can burst to 32 GB / 16 vCPU. You only pay for the burst resources during the time they are actually consumed.

Shell access: Full pseudo-terminal (/dev/ptmx) support. The CreateMicrovmShellAuthToken API lets you give users a real terminal inside their VM. This is how AI coding tools provide interactive terminal experiences.

Docker inside: You can run containers inside your MicroVM. Full OS capabilities are available installing system packages, mounting filesystems, running nested containers. One gotcha: outbound UDP is blocked by default, which breaks DNS resolution inside nested containers. Community workarounds exist for this.

Architecture: ARM64 (Graviton) only at launch.

Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Asia Pacific (Tokyo).

Pricing

Lambda MicroVMs pricing has three components: compute, snapshots, and data transfer.

Compute (US East, ARM/Graviton):

vCPU: $0.0000276944 per vCPU-second
Memory: $0.0000036667 per GB-second
Billed per second (not per millisecond like Lambda functions)

Snapshots:

Snapshot write (on suspend): $0.0038 per GB
Snapshot read (on launch or resume): $0.00155 per GB
Image storage: $0.08 per GB-month (1-week minimum retention)

Concrete example - coding assistant platform:

100 developers, each using a 2 GB / 1 vCPU MicroVM for 2.5 hours per day over 20 working days. Each environment suspends 6 times per day during idle periods. Monthly cost: approximately $1,241 total, or about $12.41 per developer per month.

Concrete example - CI/CD job runner:

10,000 jobs per month, each running for 10 minutes in an 8 GB / 4 vCPU environment. No suspend/resume (jobs run to completion). Monthly cost: approximately $1,124 total, or about $0.11 per job.

This pricing is closer to Fargate economics than Lambda function economics. You are paying for dedicated compute time, not per-invocation.

When to Use This

Use Lambda MicroVMs when:

You are building a coding assistant and need to execute AI-generated code safely
You have a multi-tenant platform where each user runs custom scripts
You need agent sandboxes where AI agents execute tools, install packages, and maintain state across steps
You are building an interactive notebook or REPL that needs per-user isolation
You run a vulnerability scanner that executes untrusted payloads
You need game servers that run user-supplied scripts with isolation
You are building reinforcement learning environments where each run needs a fresh, isolated sandbox

Do not use Lambda MicroVMs when:

You have a standard API backend (use Lambda functions)
You need automatic horizontal scaling without managing fleet logic (use Lambda functions or Fargate)
You want a fully managed agent hosting platform (use Bedrock AgentCore Runtime)
Your workload does not involve untrusted code execution (you probably do not need this level of isolation)
You need x86 architecture (ARM64 only at launch)

How It Compares to Bedrock AgentCore Runtime

Both services run on Firecracker. Both support 8-hour sessions. Both have shell access. But they solve different problems.

AgentCore Runtime is a managed agent platform. You deploy your agent code, AWS handles session routing, scaling, teardown, agent communication protocols, and authentication. You do not think about VMs. Your users talk to your agent through a managed endpoint.

Lambda MicroVMs is a raw compute primitive. You get the VM. You manage which user gets which VM. You handle lifecycle, cleanup, and routing. You have full control of what runs inside.

The analogy: AgentCore Runtime is to Lambda MicroVMs what Fargate is to EC2. Same isolation technology underneath, different level of abstraction on top.

If you are building an AI agent and want managed hosting AgentCore. If you are building a platform where each user gets their own isolated environment to run whatever they want Lambda MicroVMs.

Getting Started

You can provision Lambda MicroVMs through the AWS Console, CloudFormation, CDK, or the Agent Toolkit for AWS.

The developer guide is here: Lambda MicroVMs Guide

API reference: MicroVM API

Final Thought

Lambda MicroVMs fills a gap that has existed since serverless became mainstream. The question "how do I safely run someone else's code?" finally has a straightforward answer on AWS that does not involve stitching together three services and writing a custom orchestrator.

It is not magic. You still have to manage your fleet of MicroVMs, handle routing, and build the lifecycle logic. But the hard part fast, isolated, stateful VMs without managing infrastructure is handled for you.

For teams building AI-powered developer tools, this is probably the most relevant compute launch of 2026 so far.

📌 Wrapping Up

Thanks for reading! If this was helpful:

❤️ Like if it added value
💾 Save for later
🔄 Share with your team

Follow me for more on: AWS architecture, FinOps, DevOps, and AI Infrastructure.

👉 Visit my website | Connect on LinkedIn | Email: simplynadaf@gmail.com

Happy Learning 🚀

DEV Community