Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025 - Lambda Managed Instances: EC2 Power with Serverless Simplicity (CNS382)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Lambda Managed Instances: EC2 Power with Serverless Simplicity (CNS382)

In this video, Stephen Liedig and Archana Srikanta introduce Lambda Managed Instances, a new AWS feature that allows Lambda functions to run on EC2 instances in customer accounts while maintaining the serverless programming model. They explain the three-step setup process: creating a Capacity Provider with instance configurations, creating functions associated with that provider, and publishing versions to trigger deployment. Key differences from standard Lambda include multi-concurrency support (eliminating cold starts), resource-based asynchronous scaling using CPU utilization thresholds, and EC2-based pricing with a 15% management fee. The feature supports over 400 instance types across C, M, and R families, with AWS handling instance lifecycle, OS patching, and auto-scaling. It's designed for high-traffic, steady-state workloads with predictable patterns rather than replacing standard Lambda for bursty, unpredictable workloads. Integration support includes Datadog, ZEET, CloudWatch Lambda Insights, Powertools for AWS, and Infrastructure as Code tools like CloudFormation, SAM, CDK, and Terraform.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction to Lambda Managed Instances: Addressing Customer Needs Beyond Traditional Serverless

Good afternoon, and this is probably one of the final sessions before re:Invent comes to a close. Thanks for joining us. I know we've got lots of competition today with Werner Vogels' keynote, but I appreciate your attendance today. Thank you very much. My name is Stephen Liedig. I'm a Principal Solutions Architect with the Serverless team out of Australia and New Zealand, and I'm joined today by Archana Srikanta, who is a Principal Engineer with the AWS Lambda team. Today we want to introduce you to a new feature that we launched earlier this week called Lambda Managed Instances.

AWS has not just pioneered Serverless but, as you can see here, we've continually and rigorously innovated in this space. From the early beginnings of introducing runtimes like Python, Node.js, and Java, we've now evolved to today where we're virtually supporting any type of runtime. We've introduced VPC integration, optimized network integration through Hyperplane interfaces, and also introduced things like layers and concurrency, while continually optimizing cold starts to help you build the applications you're building today. We've also along the way introduced some new services, Step Functions and EventBridge, to help you orchestrate millions of workflows and build event-driven architectures.

Over the last 10 years, customers have been really enjoying the benefits of this really wide portfolio of Serverless services to help them run and build modern applications in the cloud today. They're doing that without having to write a lot of code, focusing on business logic and not getting bogged down by scaling, security updates, and other maintenance issues. They're only paying for the services that are driving value for their organization while minimizing waste, and they're able to take advantage of the best practices that we have for distributed architectures and security.

As you can see here, we've got a huge diversity in use cases for Serverless today. We've got financial institutions migrating core banking applications to AWS, leveraging Lambda and other AWS services to cut operational costs and to accelerate future development. We're looking at healthcare providers who've been using Lambda to automate appointment bookings, insurance claims processing, and secure patient portal access. Retail companies are adopting event-driven architectures as a way of being able to innovate quickly and handle peak shopping spikes. Startup organizations are using Serverless to keep their costs to net zero, especially in the early parts of their growth period, and then being able to iterate on features quickly. Government agencies as well are leveraging Serverless, agencies like the Australian Bureau of Statistics, who built their last Australian census entirely on Serverless.

Yet, despite all of these successes and all of this diversity, customers are still telling us they like the programming model, but there are still some use cases for which they need to look at other options. That's led them to significant architectural shifts and taking on much greater responsibility from an operations perspective. They're asking for more control around where their functions are running and what they're running on, and also being able to apply some of the early commitments and usage discounts that they're making on other compute services. They're looking at multi-concurrency as a way of optimizing price and performance as well.

What Lambda Managed Instances Offers: Control, Flexibility, and EC2 Pricing Benefits

Lambda Managed Instances is a solution to this. Lambda Managed Instances allows you to keep the same programming model that you're familiar with today and build architectures in the same Serverless way as well, while maintaining a consistent and familiar development experience. We're giving you more control over specialized compute and extensive choices around the compute that your functions are running on, and also taking advantage of no cold starts. In addition to that, we're driving efficiencies and predictability through EC2 pricing mechanisms, while at the same time giving you an option around multi-concurrency invocations.

So what is Lambda Managed Instances? Fundamentally, it's the ability for you to run AWS Lambda functions on EC2 instances of your choice in your account. You've got access to over 400 different instance types across general purpose, compute optimized, and memory optimized instance families to best suit your particular workload needs. And AWS is still handling all of the operational elements. We're dealing with the lifecycle of the instance. We're managing the operating system and the runtime patching that is built into those instances. We're dealing with all the routing and auto scaling and doing that according to your configurations. And at the same time, you're benefiting from the ability to apply EC2 pricing constructs like EC2 Savings Plans, Compute Savings Plans, Reserved Instances, and any other special agreements that you have with us.

So the question is now, when do you use Managed Instances? Managed Instances is not an in-place replacement for Lambda today. Specifically, you would be looking at using Managed Instances for things like high traffic, steady state workloads that have smooth and predictable traffic patterns. You'd be looking at using Managed Instances for applications that have very specific computational needs or memory requirements or network throughput requirements. And for everything else, you would continue to use Lambda. Lambda today, or Lambda default as we're calling it, is really ideal for workloads with unpredictable traffic and where you have short duration and infrequent invocations.

So what we're going to do today, I'm just going to walk you through an experience around how to build out your Lambda Managed Instances environment. And we'll talk about some key differences between what Lambda default provides and also how Lambda Managed Instances works. And then I'll round it off with some partner integrations and information about the developer tooling that we're supporting, as well as some pricing information. Over to you, Arch. Thank you.

Creating a Capacity Provider: Configuring Instance-Level Settings for Lambda Managed Instances

All right. Hi everyone. So as Stephen mentioned, I'm Arch. I'm a Principal Engineer with Lambda, and most recently I've been the tech lead on this project which we're all very excited to bring to you. So Stephen gave us a really great introduction into why we built Lambda Managed Instances or LMI as I'm going to call it for the rest of this talk, and I'm going to take you on a kind of technical deep dive into the experience of actually creating a function on LMI. And as we walk through that experience, I want us to just pay attention to the ways in which the LMI experience is actually very similar and very familiar with the default Lambda experience that you all kind of know and love today. And then in a subsequent section we'll talk about the ways in which LMI is different from Lambda default and how that should inform your choice of when to use which platform.

All right, so the eventual setup that we're going for here, like Stephen mentioned, is in your customer account in your VPC, Lambda will launch EC2 instances and then deploy your function on those EC2 instances. Now these EC2 instances are what we call Lambda Managed Instances, and what that means is it's mostly just a regular EC2 instance except that it's fully managed by Lambda. We handle the launching of this instance. We handle the OS patching of these instances and the entire lifecycle of the instance right up to the termination of the instance. What you can do with the instance is you can see it in your console, you can describe the instance, but you can't touch it in any way, even if you wanted to. So you can't update the instance, you can't edit the instance, you can't SSH into the instance, nor can you actually even terminate the instance. So the entire management of these instances are completely delegated to Lambda as the service managing it. And in terms of billing, like Stephen mentioned, regular EC2 billing applies to these instances along with any kind of pricing instruments that you have with EC2.

Now in terms of these functions that are deployed on your instances, it's what we call execution environments. And for those of you who are not familiar, a Lambda function execution environment is basically a live running copy of your application. So it's your function code, the language runtime underneath it, your layers, your extensions, all of that that's kind of bootstrapped and up and running, ready to handle an invocation. So that's what we call a function execution environment.

All right, so how do you make all this magic happen in your account? The experience here involves three steps. And we'll look at each of these steps in detail, but the first step here is to create a Capacity Provider.

This is where you give us all your instance level configuration and settings. Then you create a function and associate it with that Capacity Provider that you just created. Finally, you publish a function version, and this is the step that makes all the magic happen, so the launching of the instances and the deployment of your function on those instances.

Let's take a look at each of these in a little bit deeper detail, starting with the Capacity Provider. The Capacity Provider is a brand new construct. It's a Lambda construct that we've introduced just for Lambda Managed Instances, and like I said, it's basically all things instances. All of your overrides and settings that you want to provide to us in terms of your instance configuration goes in this Capacity Provider object. Here are some of the settings. The first one is your instance VPC config. The second one is the actual instance types that you want us to use for your functions. Finally, we'll look at some guardrails that you can put in terms of how we scale your instances when the load on your functions goes up.

Let's start with the VPC config. Here we have our new create Capacity Provider API. The first thing in there is the Capacity Provider name. You give it a name, you can put tags on it, and then you have to give us a role. This is the Capacity Provider operator role. This is just a standard IAM role where you're giving Lambda permissions to actually launch and manage those EC2 instances in your account. Then we come to the VPC config. This is standard VPC config, subnets and security groups, and this is the VPC that will launch your instances into. Now in terms of required parameters, this is all you really need to create a Capacity Provider. There's a whole bunch of other settings which we're going to talk about, but just know that those all have defaults. They're advanced settings for the power users who really want to fine tune the capacity that's underneath your functions. But in terms of required parameters, that's all you need. You need a role and you need a VPC, and you need to give that to us.

In terms of the subnets that you give us in your VPC config, if it is a production application, the standard AWS guidance applies. Give us subnets in three availability zones because when you do that, we will actually spread the instances that we launch, and thus the execution environments across those availability zones evenly. In terms of networking, these are just regular EC2 instances, so they have a primary network interface in your VPC, which means they get an IP address from your VPC. All of the outbound traffic from your function execution environments actually transits through this network interface of the instance. If your function wants to talk to any downstream dependencies, make sure that you have a path through your Capacity Provider VPC to those endpoints of those dependencies.

Also, the application logs that we ship to CloudWatch, those logs also transit through this network interface of the instance. Another thing you want to remember and make sure is that you actually have a path to the CloudWatch endpoint through your VPC. You can do this by either allowing internet access to hit the public endpoint, or you can use a PrivateLink CloudWatch endpoint within your VPC. Just remember that all your logs are also transiting through this instance VPC. In terms of ingress traffic, this is the same as is true of Lambda default functions today. There is no ingress traffic that's coming in through that network interface to your instance or your execution environment, so you can go ahead and close all those inbound rules on the security groups that you give us.

The other thing about VPC configuration is because all of the egress traffic from your functions is going through the instance's network, we actually do not allow you to specify a VPC config at the function level. In the create function API, if you're creating an Lambda Managed Instances function, you cannot provide a VPC config. We'll always just use the VPC config that you provide in your Capacity Provider.

All right, let's talk instance types. The full set of instance types that is supported on Lambda Managed Instances is basically these latest generation C, M, and R instance families. C is the compute optimized EC2 instance family, M is general purpose, and R is memory optimized. In terms of sizes, we support the large instance sizes and bigger within these families.

In terms of architectures, we support both Intel and AMD for X86, and then we also support the ARM Graviton instance types. Now within this large set of instance types that we support, by default Lambda will select the instance types for your function based on your function's memory size and configuration, and we'll talk a little bit more about that when we get to the function section. But you can always override if you want to constrain the set of instance types that we use. If you don't want to use this entire set, you can do that via an override, and that's where we come to this instance requirements section within the capacity provider. Here you can specify allowed instance types, which means only use these instance types, or you can specify excluded instance types, which is saying use everything else but these. And then a few other settings in here.

Architecture by default we assume is an X86 application. You have to overwrite it for ARM. The other thing is to remember that your architecture of your function matches the architecture of your capacity provider. And finally, for the EBS volumes that are attached to your instances, by default they're encrypted with a service managed key. Here you can provide your own KMS key that we can use to encrypt the EBS volumes.

All right, scaling. So we have a pretty deep dive into scaling a few sections later, and we'll talk about some of the more advanced scaling configurations there. But here I just wanted to introduce that there is a capacity provider scaling config section. So this is all instance level scaling here that we're talking about. The first setting there is a max vCPU count. This is basically a limit on the maximum instance capacity that we can scale out to as the load increases within your capacity provider. It's mostly useful as a cost control knob, so you can put a hard limit on the instance billing that can occur from a given capacity provider. Again, optional setting. We have defaults you can override it only if you have a need to. And like I said, there's more settings in here which we'll talk about in the context of the scaling section.

Function Creation and Configuration: Memory, CPU, and Instance Type Selection

All right, so now you have a capacity provider. Your next step is to create a function, and this is where the familiar Lambda experience kicks in. The process to create a function is almost exactly the same thing as you would go through today, with the only little change being that when you're creating a function, you have to associate it with this capacity provider that you just created. This is what lets us know that this is an LMI function and it needs to be deployed on your instances, as opposed to a default Lambda function that goes on our infrastructure. And then we'll just talk about some of the function features that are supported with LMI. And finally, we'll talk about the function memory and CPU settings and how that influences the instance type that we select underneath those functions.

All right, so this is our good old familiar create function API. And in here we have a new section for the capacity provider config, and that's where you provide the capacity provider ARN. It's just as simple as that to make it an LMI function. And you can associate multiple functions with the same capacity provider, in which case all of those functions will share the same instance capacity within that capacity provider.

All right, moving on to function features that are supported. Packaging formats, we support both OCI containers and zip format. Language runtime, latest versions of Java, Python, Node, and .NET. So in addition to OS patching that's happening at the instance level, you continue to get the benefit of the actual language runtime being managed and patched by Lambda as well, so that's the same familiar experience. Other features that we support in terms of the observability space, layers and extensions are supported with LMI. In terms of invoke dynamics, function URLs you can use with LMI, response streaming works with LMI. The invoke timeout is 15 minutes, which is the same as default functions today. But finally, we did announce another big launch from Lambda, which was Durable Functions, which allows you to run longer running functions that can tolerate interruptions and multi-step applications, and that also works with LMI.

In terms of some things that are not supported or not applicable I should say to LMI, there are some features that are not supported or not applicable to Lambda Managed Instances.

SnapStart is one that is not applicable to LMI because there are actually no cold starts in LMI, and we'll talk a little bit more about that in the scaling section. Because we don't have cold starts, SnapStart is not meaningful in LMI. Provisioned and reserved concurrency are not supported with LMI because we have equivalent concepts in LMI by means of min and max execution environments. Again, this is something we'll see when we talk about scaling.

All right, so let's talk about function settings now. As most of you are familiar, the create function has a memory size setting where you tell us how much memory your function's execution environments should get. In terms of the ranges, the range has gone up higher for LMI. We do support up to 32 gigabytes of memory size for your LMI functions. Another new setting that we've introduced for LMI is this execution environment memory per vCPU. This is basically a ratio of memory to vCPU for your function's execution environments. Based on your memory and this ratio, we will extrapolate how much CPU needs to be allocated. The default is 2 to 1, so 2 gigabytes per vCPU, and it can go up to 4 to 1 or 8 to 1 as the allowed values there.

Applying those ratios and the memory settings, this is the table of all combinations that we allow. One thing to note here is that we don't allow fractional vCPUs. Depending on the ratio that you're using, your memory will jump in multiples of the ratios. If you're using 8 to 1, you can only have memory that's in multiples of 8. If you're using 4 to 1, you can only have memory that's multiples of 4. In terms of instance type selection, these ratios that we have basically match the instance family ratios that we have. If you're doing a 2 to 1 ratio function, then that's when we'll select the compute optimized instance types. If you're on the other end of the spectrum and you're doing an 8 to 1 ratio, then we'll select the memory optimized instance types underneath your function, and 4 to 1 is the general purpose instance types.

Publishing and Deploying Functions: Activation, Scaling Configuration, and Invocation

All right, so now we have a capacity provider, we have a function that's designated as an LMI function, but there's still no instances or execution environments in your account yet. All of that magic happens when you actually publish a function version, which is what actually triggers the deployment in this case. Publishing a function version is functionality that exists today. It's not required for Lambda default functions, so the only change here is with LMI functions you have to publish a version to actually deploy it. There are two ways you can do that. You can either use the existing publish version API, or we actually have a little bit of syntactic sugar in the create and update function API where you can just set a flag to be true, the publish flag to be true, and then every create and update will automatically publish a version for you behind the scenes.

It's when you publish this version that all the action starts happening in your account. We'll take a look at what that deployment looks like, and then we'll also take a look at the function level scaling configuration. All right, so when you call publish version, the first thing we'll do is we'll actually map your function to the instance type that's appropriate for it. We'll launch those EC2 instances in your account, and on those EC2 instances we'll actually go ahead and initialize, by default, 3 execution environments on those instances. We'll launch 3 instances in 3 availability zones if you've given us 3 and initialize 3 execution environments on them. Until the initialization is complete, your function is actually in a pending state, and only after it goes active can you actually start invoking your function.

All right, now these 3 execution environments that come up as part of your function activation, we call them min execution environments. It's 3 by default, but of course you can always overwrite them, which comes to our function scaling config section. This is yet another new API for you to override some function level scaling parameters. This is the put function scaling configuration API. Here you have min and max execution environments, which is bounding at the function level how many execution environments we can scale in and out.

In terms of minimum execution environments, as we said, the default is 3. There are some reasons why you might want to override this minimum. If you want to pre-provision capacity for a known peak or known incoming demand or buffer, you can set your minimum to be higher and we'll always keep those execution environments warm and up and running. On the other hand, you can override it lower if you don't care for having 3 execution environments up and running and for that kind of high availability stance. If you're using dev or test workloads, you can override that to be lower.

In terms of maximum execution environments, by default there is no maximum, so we're basically allowed to scale as much as we need to. You can override it for fair sharing between multiple functions that are mapped to the same capacity provider. So as I said, if you have multiple functions in the same capacity provider, they share the instance capacity within that capacity provider, so you can cap how much each can scale to in order to prevent noisy neighbor disturbance. These are equivalent to the provisioned concurrency and reserved concurrency model that we have with Lambda Default. We're just doing it slightly differently here for LMI.

Finally, another small trick here is that you can set your minimum and maximum to 0, and what that will do is basically cause your function to completely descale and scale down within your capacity provider. It is in a deactivated state at that point, so invokes won't go through. You'll have to come back and set minimum and maximum to something greater than 0 for us to scale it back up and allow those invokes to go through. So it's just a way for you to basically deactivate the function. If you're going home at night for your dev test workloads, you don't have to actually delete the function. You can scale it down and then when you come back in the morning you can scale it back up.

All right, so you've created a capacity provider, you've created a function, you've published the version, we did the deployment on your instances, and now you're ready to invoke. The beauty of this feature is that your invoke experience is exactly the same as what it is today. So when your invoke comes to us, we will check if your function is an LMI function, if it has that capacity provider associated with it, and if it does, then we will just route all of your invokes to these execution environments that we've deployed on your instances. And because the invoke experience is exactly the same, you actually get all of the event source integrations that are supported with Lambda today. They just work out of the box the same with your LMI functions, including the different invoke types that we have. We have the direct invoke or synchronous invoke as we call it. We also support the event invocation type and this whole slew of event integrations work just as is.

Understanding Concurrency in Lambda Managed Instances: Multi-Concurrent Execution Environments

So we saw how the create function and the invoke experience, we've retained that as closely as possible with the Lambda Default experience, but with just a few extra clicks and a few extra steps with the capacity provider, we've effectively completely changed the infrastructure underneath your functions from service-owned infrastructure to just regular good old EC2 instances in your account. And because of that kind of big capacity shift underneath your functions, there are some differences between Lambda Default and LMI in terms of the management of the capacity underneath that you should be aware of when you're using LMI.

So let's take a look at those. The first one here is concurrency. I'm sure for those of you who've used Lambda before, you're probably intimately familiar with the concurrency of Lambda Default functions. Concurrency has a slightly different meaning in the context of LMI, and Stephen briefly mentioned that we do support multi-concurrent functions in LMI, so we'll take a look at that here. The second one is scaling. This is another thing that Stephen mentioned, which is that we don't have any cold starts in LMI, so we do scaling in a slightly different way in LMI, and we'll take a deeper look at that and understand how scaling happens in LMI.

And finally, the security boundary. With Lambda Default, you're running in our service account. It's a fully multi-tenant setup. Here with LMI, you're going to be running in your account, which is a fully single-tenant setup, so the boundaries are a little bit different there in terms of security, and we'll talk about that.

All right, let's start with concurrency. Before we dive into LMI, let's recap a little bit about what concurrency means for Lambda Default. So in Lambda Default, when you have a function and you see an invoke come in, if we don't have any execution environments for your function yet, what we will do is in the path of the invoke, we will say there's no execution environment, we'll initialize a new execution environment, and then execute your invoke within that execution environment. Now while that invoke is ongoing, if we get a second invoke, we'll say, oh that execution environment one is busy, and what that will do is it'll initialize a second execution environment within which your second invoke will execute.

These invokes that cause the initialization of new execution environments, we call them cold starts, and they incur slightly higher latency because we have to actually run your initialization logic as part of the synchronous invoke request path. Now this invoke number three, it comes in after invoke one has completed. So what happens with invoke three is we actually keep execution environment one around in the hope that you'll send us more invokes. So with invoke three, we can just route it to a pre-existing execution environment and we don't have to pay the cost of that initialization, and this is what we call warm invokes. Everyone loves warm invokes because they're super fast and they don't incur that latency cost of the initialization.

So the thing to notice here is that in Lambda Default, these execution environments are all what we call singly concurrent, which means that there's only ever one invoke being executed out of a given execution environment at a time. It may get reused for multiple invokes later in time, but at a given time, only one invoke is active per execution environment. So when we say concurrency in Lambda Default, what that really means is just the number of active in-flight invokes or the number of active execution environments that are serving those in-flight invokes. That's what concurrency means in Lambda Default.

In LMI, things are a little bit different. So here we have three execution environments that we talked about that were pre-initialized when you published your version. And then when your invokes start to come in, LMI will actually send, we can send multiple concurrent invokes to a single execution environment, and this is what we call multi-concurrency. A single execution environment can be handling multiple invokes simultaneously. And this is why we don't have cold starts, because we've pre-initialized these execution environments as part of your function activation. So when the invokes come in, they will always get routed to one of these execution environments that we've pre-initialized. So no cold starts in LMI.

So the first question here is how do you know how much concurrency or how much multi-concurrency can you send to each execution environment? Now we came up with some defaults for the maximum concurrency that an execution environment can take based on the language. After researching all the applications that you all have built on Lambda and other serverless products in our suite, we looked at all of the workloads, and these are the defaults that we came up with. But you can always override it, and this is in our create function API in that new capacity provider config section. You can also specify a per execution environment max concurrency, so if you have certain bottlenecks that we're not aware of in terms of your dependencies or anything like that, you can come in here and override what's the max concurrency an execution environment can take.

Now we also realized that it's not easy to come up with this magic max concurrency number, and there is no one size fits all. So we have some protections here in place in case you set that max concurrency number to be too high. So let's take a look at how that works. So here you have your managed EC2 instances with a few execution environments deployed on it. And also on these instances, sitting in front of your execution environments, is an agent that we deploy. We're calling it the LMI agent here, and this agent also acts as a proxy to your execution environments when the invokes come in.

So when the invokes come into our service, we will try to balance the load evenly across all the execution environments that you have. We'll pick an execution environment and we'll route it to this LMI agent on that instance.

Now if an execution environment has either reached the max concurrency that you've configured, or if it starts running into high resource pressure, either memory pressure or CPU pressure, even before it reaches the max concurrency, then our LMI agent will say "Hey, back off. This thing is running hot. Go try somewhere else," and then we will reroute your invoke to a different execution environment. Now, if all of these execution environments start to heat up and everybody is telling us to back off, then we'll actually just come back to you and throttle your invoke back. We do this, like I said, for what I call goodput protection. We don't want to just blindly send invoke traffic all the way through to your execution environments until you hit this max concurrency number and actually brown out or crash all of your execution environments, resulting in a full outage of your application. So we're measuring and seeing how your execution environments are doing, and if it starts to get to a point where we think it's going to brown out, we'll start rejecting some traffic so that at least you can make forward progress with the capacity that you have.

In terms of metrics, if this starts happening, we have new CloudWatch metrics for the exact reason that you got throttled and the bottlenecked resource. You can get throttled because of your max concurrency—that's concurrency throttles—or CPU throttles, memory, disk. What was the bottleneck resource that caused those throttles? And another thing about multi-concurrency before we move to the next section is that because your execution environments are handling multiple invokes simultaneously, it is important for you to realize that you need to have these thread safety best practices in mind as you code and develop your application. This is something that's different from Lambda default, where you have singly concurrent execution environments. Here, thread safety best practices mean you need to avoid mutating any shared objects or global objects when you have multiple invokes running at the same time. Use thread-local storage and thread-safe storage for your data structures. Any shared clients or connections that you're initializing, make sure that the configuration on those are immutable in the invoke itself. Also, when writing to disk, you need to remember that if multiple invokes are writing to disk at the same time, they can clobber on top of each other, so use request-specific file names for any writes you're doing to slash temp.

Scaling and Security Boundaries: Resource-Based Scaling and VM-Level Isolation

All right, so that was concurrency. Let's talk about scaling. Again, let's recap what scaling means in Lambda default. In Lambda default, the scaling is just organic. It's these cold starts—this is when we initialize new execution environments, and that's really just how we scale. When you send us your big load of invokes, if you're getting a lot of concurrent invokes, we just naturally scale out the execution environments as cold starts in the invoke path. But in LMI, we said we don't have cold starts, so the question becomes, if all of these existing execution environments start to hit that point of saturation, how and when do we actually scale up new execution environments?

The scaling in LMI is actually asynchronous and resource-based scaling. What this means is that we're constantly monitoring the resource heat on your execution environments, and when we see that it's starting to get close to a certain threshold, that's when we'll asynchronously scale up new execution environments within your capacity provider. And if that needs new instances, we'll do that as well, so we'll take a look at that here. Here you have three instances with three execution environments. You have a friendly LMI agent there, and the LMI agent is constantly gathering CPU usage stats from your execution environments and your instances and sending that data over to us. We are monitoring the average CPU utilization at the capacity provider level but also at your function level, and we're trying to maintain this target CPU utilization threshold. If you're within this plus or minus 10% of that target threshold, then that's when we call you as being in steady state. You're happy, there's no scaling action happening. If you start to heat up your CPU and you start to get into that range above the plus 10%, that's when we'll start to scale up new execution environments.

First, we'll fill up existing EC2 instances, and then if you need more, we'll launch new EC2 instances and deploy more execution environments on those. Now, if you have a burst of traffic that's very spiky and it bursts up faster than we can react and add these instances and execution environments to your capacity provider, that's when you can push your CPU utilization into that goodput protection mode where we have to throttle you until we can actually bring up more instances and more execution environments to meet that load.

In terms of scale down, it's the opposite. If you then start to cool down and come below that minus 10% mark, we'll first scale down your execution environments, and then we'll decompress and actually scale down instances if we can pack it back in. In terms of scale down, you're always protected by your minimum execution environments that you configure, which is three by default, but we'll never scale you down below that number, even if you're seeing no invokes to your function.

Alright, so the question that all of you are probably having is, what is this magic target CPU utilization threshold? So this is where we come back to that create capacity provider scaling config and the more advanced scaling options that I talked about here. We have a scaling mode that can be automatic or manual. If it's automatic, that's basically where we are looking at your load patterns, your scaling patterns, and automatically tuning that threshold to be at a good place for your application. In manual mode, you can take things into your hands, into your control, and you can manually configure a target CPU utilization percentage.

In terms of metrics to monitor all of this fun stuff, again, these are all new metrics that we've added for Lambda Managed Instances. At the function level, we give you CPU utilization and memory utilization, so this is aggregated across all the execution environments for that function. You can also see the concurrent executions, like where your max limit is versus what you're actually using, and finally, the chart at the end there is the actual number of execution environments that we've scaled up to for your function.

And we also have similar metrics out at the capacity provider level, so this is if you have multiple functions within the capacity provider. At the instance level, you can look at memory utilization, CPU utilization, and then the actual count of instances that we've launched within your capacity provider. Alright, so the scaling takeaway here that I wanted to leave you with, and this is again something that Stephen brought up, is with Lambda default, when your functions run on our infrastructure, we are running a very specialized stack that's built on Firecracker, and that is hyper-optimized to handle these spiky, sparse, bursty workloads. It's hyper-optimized for cold starts. It's hyper-optimized to keep those cold start latencies down so that we can actually bring up a new lightweight Firecracker VM in the path of your invoke.

Lambda Managed Instances, on the other hand, is actually built and designed for a different profile of workloads. Like Stephen was saying, this is built more for stable workloads that have a good baseline of traffic and more smooth and predictable workloads. You know, the whole purpose of this feature is to move away from specialized stacks that we are using behind the scenes and to move to more general purpose EC2 instances in your account. So the machinery underneath is very different, and that's why the profile of workloads that Lambda Managed Instances is suited for is different from the profile that's suited for default. So the two are really complementary features of Lambda that you can use together to cover more of your use cases.

Alright, the final section here is the security boundary. Like I said, with Lambda default, all of your functions are running in our service account. It's a big multi-tenant account. And in Lambda default, every function's execution environment runs in its own VM because it's a multi-tenant setup. With Lambda Managed Instances, everything's running in your account, nothing, nobody else's code is running in your account. It's a single-tenant setup in that sense. And the security boundary here for Lambda Managed Instances is really the capacity provider,

If you want VM level isolation between your functions, the way you would do that is to have separate capacity providers because by definition, the instances within your two separate capacity providers are going to be different. So if you map two functions to separate capacity providers, they will be on different VMs in different instances. Within the EC2 instance, the function execution environments that we deploy are basically containers, so they're separated by a container boundary. Also, if you map multiple functions to the capacity provider like I mentioned, then the execution environments from those different functions can also share the same instances, and again, it's a container boundary there. So the takeaway here really is if you want VM level isolation between your functions, map them to separate capacity providers because that is the VM level security boundary in Lambda Managed Instances.

Partner Integrations, Tooling Support, and Pricing: Complete Ecosystem for Lambda Managed Instances

All right, with that, I will hand back to Steven to talk about partner integration and tooling and pricing. So that was a really good introduction into Lambda Managed Instances, and I'm just gonna talk through some of the new partner integrations that we have and introduce you to our launch partners Datadog and ZEET. Datadog provides full observability for Lambda Managed Instances. Customers can monitor key metrics to understand the health and the utilization of the Lambda Managed Instances, and customers can also alert on those metrics and any errors and anomalous behavior that might need attention. Using the automatically correlated metrics, logs, and traces that span upstream and downstream services, customers can also investigate and resolve any issues.

For any new instances that get launched, there is trace support and auto instrumentation, so customers can get automatic trace propagation simply by installing the Datadog extension. Now ZEET also supports Lambda Managed Instances through its autonomous optimization platform that helps improve performance, cost, and reliability of the Lambda EC2 instances and the rest of the environment. This gives engineering and platform teams as well as FinOps teams a single workflow for making safe, data-optimized decisions. Their platform automatically scores Lambda functions to show you which are strong candidates for moving to Lambda Managed Instances, so you don't have to do the guesswork.

Once you're ready to move, ZEET's co-pilot also lets you migrate and configure functions with a single click, and it's a simple, low-friction way of moving and migrating your functions to Lambda Managed Instances. Now AWS AppConfig also featured flag capability and the other dynamic configuration that comes with that service is also fully supported by Lambda Managed Instances. By using the AWS AppConfig agent Lambda extension as your Lambda functions, you can make calling those feature flags simpler, and also the extension itself includes best practices that simplify using AWS AppConfig by reducing costs. That cost reduction results from fewer API calls to the AWS AppConfig service and also results in shorter Lambda processing times.

Now Amazon CloudWatch Lambda Insights also provides one-click deployment from the Lambda console. This allows you to filter capacity providers. It gives you the ability to drill down into instance types and functions, as well as providing you with 12 key metrics with one-minute granularity for things like the maximum and average CPU utilization as well as memory utilization statistics. This is providing you with a fully integrated experience to monitor your Lambda Managed Instances. Now if you're using Powertools for AWS today, Powertools is really a suite of utilities that help you standardize application development and support across a number of different use cases such as observability, batch processing, helps you with idempotency implementations, as well as dealing with things like feature flags and data extraction.

The Powertools suite is also fully compatible and thread safe and ready to run on your Lambda Managed Instances. And of course we've got full infrastructure

full Infrastructure as Code support through AWS CloudFormation, the Serverless Application Model, and the AWS Cloud Development Kit, as well as our partners Terraform. All of those APIs that Archana was talking about are fully supported within those frameworks.

Now from a pricing perspective, AWS Lambda Managed Instances uses EC2-based pricing with the addition of a 15% management fee on top of the EC2 instance costs. The price of the instance itself will largely depend on the discounts that are applied to those instances. However, the management fee is still based on the default price for the EC2 instance. On top of that, you would still be charged for the same cost per invocation that Lambda has today, with the exception that you're no longer paying for the function duration costs because everything is running on your machine.

So to wrap things up, let's have a look at some key takeaways. As I mentioned earlier, Lambda Managed Instances isn't designed to be a replacement for the Lambda functions that you're running on Lambda today. It's really designed for those specific use cases to help you with high-traffic and steady-state workloads or where you need specialized compute options to run functions for specific use cases. For everything else, Lambda provides you the ability to continue running new applications that have unpredictable traffic, short duration, or infrequent invocations. So that's a really key takeaway here.

And also, when you understand what we've just done for you, basically we've allowed you to define your own execution environments. This is you running Lambda as you do today, the same experience. You get to maintain the same programming model, you get to maintain the same architecture, you get to use the same development tools, but you've got more control over where and how your functions are running. And you're also able to apply all of the cost benefits that you're getting with EC2 on top of that.

So that's all we have for you today. We are really interested to see what you build with Lambda Managed Instances and look forward to seeing what you do with that. Thank you very much.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community