Tanseer for AWS Community Builders

Posted on Jun 30

Lambda MicroVMs Are Kind of Insane: Tiny Virtual Machines That Boot in Milliseconds and Sleep When You Are Not Looking

#aws #lambda #serverless #stateful

A beginner friendly guide to the Firecracker powered microVMs behind AWS Lambda, the new suspend and resume superpower, and the big question everyone keeps asking: is this still serverless?

Introduction

Imagine a virtual computer so small and so fast that it starts up in less time than it takes you to blink. Now imagine thousands of them running on a single physical server, each one completely walled off from its neighbours, and each one quietly going to sleep the moment it has nothing to do, so that you stop paying for it.

That is not a science fiction pitch. That is roughly what is happening inside AWS Lambda today, and the newer microVM capabilities push the idea even further.

In this post we will start from zero. We will explain what a microVM even is, meet the tiny engine that makes the whole thing possible, walk through what happens when a single request comes in, and then look at the features that make people say this is a little bit crazy. By the end you will be able to answer the question that is dividing the cloud community: after all of this, is Lambda even serverless anymore?

Let us get into it.

So What Exactly Is a Lambda MicroVM?

A microVM is a tiny, lightweight virtual machine. It behaves like its own small computer with a strong security wall around it, but it throws away almost all the heavy baggage of a normal machine, so it starts in milliseconds and uses very little memory. Picture a tiny, fast, single purpose box that appears, does one job, and disappears.

Here is the part that surprises most people. Lambda has always run your code inside these microVMs. Every time your function runs, AWS quietly spins up a tiny isolated machine just for you. The recent excitement is about giving those microVMs new powers, such as the ability to pause when idle and to be reached directly over the internet. So when people say "Lambda MicroVMs," they are really talking about this next chapter of an idea that has been humming away under the hood for years.

Meet Firecracker, the Tiny Engine Doing All the Work

Powering all of this is Firecracker, a technology that AWS built and then released to the public as open source. AWS created it specifically to run microVMs safely at massive scale, and it is the reason a tiny machine can boot in around 125 milliseconds while still keeping the strong security wall of a real virtual machine. This same AWS made engine quietly runs underneath both Lambda and AWS Fargate.

How It Works: The Journey of a Single Request

Let us follow one request from start to finish so the architecture clicks.

You upload your code. This can be a simple zip file or a container image. You also choose how much memory the function gets, which in turn decides how much CPU power it receives.
Something triggers it. A user hits an endpoint, a file lands in storage, a message arrives on a queue, and Lambda decides your function needs to run.
A microVM appears. AWS spins up a Firecracker microVM with your chosen runtime and your code inside. This little box is called the execution environment. If a fresh one has to be created from scratch, that small delay is known as a cold start.
Your code runs. The environment initialises, your handler runs, and a response goes back.
The environment is kept around. Instead of throwing the microVM away immediately, Lambda often keeps it ready for a little while. If another request arrives soon, it reuses the same warm environment and skips the startup cost entirely. This is called a warm start, and it is why the second request to a function usually feels much faster than the first. So the architecture is really a fleet of these tiny machines being created, reused, and recycled constantly, all managed for you so you never touch a server.

Each One Gets Its Own Front Door: Dedicated HTTP Endpoints

An endpoint is just a web address that something can send a request to. HTTP is the everyday language of the web that browsers and apps use to talk to servers.

For a long time, if you wanted your Lambda function to be reachable over the web, you usually had to put another service in front of it to handle the incoming traffic. The newer approach gives a function its own dedicated HTTPS endpoint, a clean and direct web address that points straight at it. No extra plumbing in between.

The address looks something like this:

https://your-unique-id.lambda-url.your-region.on.aws

You can keep it open or lock it down so only authorised callers get through. For simple use cases, such as a webhook that some other service calls, or a small backend for a single page app, this is wonderfully direct. Your request lands on a private little microVM that exists just to answer you.

The Real Magic: Suspend and Resume

This is the feature that earns the word "crazy."

To understand it, you need one more idea: a snapshot. A snapshot is a complete saved picture of a running machine at a single moment in time, including everything sitting in its memory. If you take a snapshot and then restore it later, the machine wakes up exactly where it left off, as if no time had passed at all.

Firecracker can do this with microVMs. It can freeze a running microVM, save its full state, and bring it back to life almost instantly.

Now think about what that unlocks. Traditionally you had two bad choices. Either you kept a server running all the time so it was ready to respond instantly, and you paid for every idle second. Or you shut it down to save money and then suffered a slow start whenever traffic came back.

Suspend and resume gives you a third option. The microVM can stay paused while nothing is happening, and the headline benefit is simple: you are not paying for a fully running machine while it sits idle. When a request finally arrives, it wakes up quickly and carries on. You get the responsiveness of an always on server with a cost profile much closer to pay only for what you use.

That is the holy grail people have chased for years. Be instantly ready, but only pay when you are actually doing work.

Lambda MicroVM vs EC2 vs Classic Lambda

It helps to see where this sits between the two options most people already know.

EC2 is a rented virtual server that you control and manage yourself. Classic Lambda is short lived, event driven functions with a hard time limit. The microVM approach lands in the middle, borrowing the best of each.

Feature	EC2	Classic Lambda	Lambda MicroVM
What you manage	The whole server	Just your code	Just your code
How long it can run	As long as you like	Up to 15 minutes per run	Designed for longer and more stateful work than the 15 minute limit
Startup speed	Slow, minutes	Fast, with occasional cold starts	Fast, with quick resume from a paused state
Cost when idle	You keep paying while it runs	Nothing, it is not running	The big win is not paying for a fully running machine while idle
Scaling	You set it up	Automatic, scales to zero	Automatic, scales to zero
Best feeling	Total control	Pure simplicity	Control and simplicity together

The short version is this. EC2 gives you full control but you babysit it and pay for idle time. Classic Lambda is beautifully simple but tightly constrained. The microVM model tries to give you longer running, more flexible compute that still scales down to nothing when no one is using it.

What Is This Actually Good For?

A new tool is only exciting if it solves real problems. Here is where this approach shines.

It is great for long running jobs that do not fit neatly into a short window, such as heavier data processing or video work that would bump into the classic 15 minute ceiling.

It suits stateful sessions, meaning work where the machine needs to remember something between requests rather than starting fresh every single time.

It is a natural fit for AI and machine learning inference, where a large model can be expensive to load. You load it once, let the machine pause when traffic is quiet, and resume instantly when the next request comes, instead of paying to keep a big GPU style box running all day.

It works beautifully for interactive sandboxes and code execution, the kind of thing where each user needs their own safe, isolated little environment that spins up fast and tears down cleanly.

And it is useful for bursty traffic, where requests come in unpredictable spikes. You get instant readiness during the spike and near zero cost during the long quiet stretches in between.

When would you not bother? If your workload is genuinely tiny and event driven, classic Lambda is still the simplest possible answer. And if you need deep control over the operating system, custom networking, or hardware that must run nonstop, a traditional server like EC2 is still the right call.

What About the Price?

Cost is usually the deciding factor, so let us compare the three billing styles in plain terms. To follow along, you need one bit of vocabulary: a GB second is just one gigabyte of memory used for one second. It is the unit Lambda uses to measure how much work you actually consumed.

EC2 charges you for the time the server is switched on, whether or not anyone is using it. A box running quietly overnight with zero visitors still appears on your bill. You pay for capacity, not for usage.

Classic Lambda charges you in two parts: a tiny amount for each request, plus an amount for compute measured in those GB seconds. If nobody calls your function, it costs you nothing. You pay for usage, not for capacity.

The microVM model aims for the best of both. You get a more capable, longer lived machine, but the key promise is that you are not charged for a fully running VM while it is paused and idle. You pay mainly for the active time when it is actually doing something.

Important: cloud pricing changes often, and the exact rates for these options vary by region and configuration. Rather than trust any single number you read in a blog, including this one, always check the current AWS pricing pages before you make a decision.

So Is It Still Serverless?

Here is the fun part, and it is genuinely a matter of debate.

The word serverless never meant there were no servers. It meant you, the developer, never had to think about them. The classic definition came with a few expectations: your code is event driven and short lived, it scales automatically all the way down to zero, you never manage infrastructure, and you only pay for what you use.

Now look at what these microVMs offer. Long running. Able to hold state. Reachable through their own dedicated web address. Sitting paused and ready rather than vanishing completely. To some people that starts to look an awful lot like a small, managed server wearing a serverless costume.

The case for "yes, it is still serverless" is strong. You still never patch an operating system. It still scales to zero. You still pay for value rather than for idle capacity. By the spirit of the original promise, nothing has been broken.

The case for "no, the purity is gone" is also fair. The original idea of a tiny, stateless, event triggered function that lives for a moment and dies has clearly been stretched into something bigger and more persistent.

The honest answer is probably this. Serverless is becoming a spectrum rather than a strict box. On one end you have pure functions that flash in and out of existence. On the other you have long lived managed machines. These microVMs live somewhere in the comfortable middle, and they are quietly redefining what the word can mean. Maybe the better question is not "is this serverless," but "does this let me ship without babysitting infrastructure." If the answer is yes, the label matters a lot less than the freedom.

Wrapping Up

Lambda has always been powered by tiny, fast, isolated microVMs running on Firecracker. What is changing is how clever those microVMs are getting. They can now boot in a blink, answer the web directly, and most remarkably of all, doze off when idle so you are not paying for a machine that is doing nothing.

Whether you call it serverless, managed compute, or something brand new, one thing is clear. The line between a server and a function is getting blurrier, and that is a great problem for developers to have. The more the cloud handles for us, the more time we get to spend building the things we actually care about.

If you are new to AWS, the best next step is to try spinning up a simple function, give it a dedicated endpoint, and call it from your browser. Seeing your own tiny microVM answer back is the moment the whole idea finally clicks.

Let Us Connect

If you found this helpful, have a question, or spot something you want to discuss, I would love to hear from you. Reach out at khantanseer43@gmail.com.

DEV Community