Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025 - Building the future with AWS Serverless (CNS211)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Building the future with AWS Serverless (CNS211)

In this video, AWS serverless compute leaders Usman Khalid and Janak Agarwal demonstrate building applications with AWS Lambda's transformational new features. They showcase Lambda Managed Instances, which combines Lambda's simplicity with EC2's flexibility, enabling steady-state workloads with 25%+ utilization and cost optimization through automatic scaling. The session includes live demos of building CRUD APIs using the Serverless MCP Server for AI-assisted code generation with built-in best practices, handling extreme traffic spikes with Lambda's scaling of 1,000 execution environments per 10 seconds, and Lambda Durable Functions for reliable long-running workflows up to one year. Additional launches covered include tenant isolation for SaaS applications, Rust runtime support, and enhanced developer tools with remote debugging capabilities.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Setting the Stage for Serverless Innovation

I think we're starting a little early, so I just wanted to see if we can get a little energy in the room. Myself and Janak have an awesome set of slides and an awesome set of demos to show you. We're actually going to build some stuff together here. We usually don't do that in a breakout session, but I think it's going to be really cool.

Just a quick show of hands, how many of you are developers? Wow, that's a lot of developers. That's good because you're going to enjoy the building part here. How many of you have heard about the two really big transformational launches in Lambda this re:Invent? Okay, a few hands, so I think that's going to be useful to a lot of developers. How many of you are engineering leaders? Okay, lots of you are engineering leaders. Awesome. So I think there's going to be a bit of a mix of both things here today.

I'll introduce myself in a second. There's going to be a little bit of a mix of Lambda strategy or serverless strategy all up. What are we trying to do? I'm going to actually share, I mostly speak to a lot of customers this week, I'll share some anecdotes about that too. Every re:Invent has a bit of a surprise. I think this is my ninth re:Invent, so why don't we get going and I'll share my stories with you guys and we're going to build some cool stuff together.

Okay, we're talking about building the future with AWS Serverless, and we've actually played around with the title a little bit. We were saying we were going to talk about the future of serverless, but I think one of the things we wanted to share with you is what is our strategy, how we think people are going to build in the future, and why serverless is so key here. I'm going to let my partner introduce himself when he comes up to do the first of the demos, but I'm Usman Khalid, folks. I'm the head of serverless compute.

I've been with AWS for almost 12.5 years, and I started my role with a little known service called Auto Scaling and Auto Scaling Groups, which was a precursor and it's still the thing that powers Lambda, ECS, EKS. I've been in the serverless space with EventBridge, Step Functions, these services are under me as well at AWS since 2018, 2019. So it's been quite a few years with the team here and still going strong, and my passion always has been developers. I've been a developer myself. If you were to Google my name, you'll see I made a breakdancing puzzle game that not a lot of people bought, but okay, it was a great learning experience. So I know I'm not a break dancer in case you're wondering.

That's me, and why you should listen to me is basically because I'm the guy who sets the strategy for serverless, and Janak and I work really well together to work through this, and that's why we want to share with you. We wanted to share with you in this for the next hour the things we've obviously done and how they connect with that overall strategy and where we're taking serverless because serverless, you know, we celebrated 10 years of Lambda last year. But it really is at an inflection point, and I think all of us, we're all in technology, lots of developers, lots of engineering leaders here, we're all seeing this inflection point in technology, kind of like what the Internet did in the early 2000s. I was a little too young for that, although my gray hair speaks otherwise, but I think with AI, and look, I'm not going to talk tons about AI and I'm still learning how to use it myself. And no, I don't believe it's going to change all our jobs, but I think it already is accelerating us, so I'm going to share some of the anecdotes for that.

So I'm going to set the stage for what the strategy is and the agenda. We have this build challenge, so we're going to build a few things. Janak is going to kick that off, and then we're going to talk about some of our other innovations which were launched a week before re:Invent. So you might have missed those as well. And then we kind of recap and share with you where we're going next. That's what we're going to cover today. Sounds good? Yes, nodding heads? Awesome.

The Evolution of Development: Why Change Matters Now

Alright, so I love this quote. You know, all the engineering leaders here and the engineers themselves, you know that change is constant, right? But my favorite quote is all change is about change, and I actually found that the one who actually said the original one is PJ O'Rourke. He's a journalist, a US journalist. And look, I already kind of signaled this to you folks. We are going through change, right? And kind of like the big takeaway for me talking to customers at re:Invent, especially when they're talking to me about serverless and the future of serverless, is really that maybe their old platform strategy, the old way of doing things is just not working.

Time and time again when I hear from developers, development leaders, platform leaders, is that there's an even bigger struggle with the engineering teams between the engineering and platform teams now because the engineers just want to try things when the cost of trying something has just plummeted. You can, and I'm not talking about vibe coding, I'm talking about real production engineering here, and that's what we do at Lambda and serverless as well. We have, and this is some of the stuff we're going to show you how we were able to accelerate that because we use AI as a tool, as an accelerant.

When things have become so much faster, if your deployments, if your operations, if your patching activities are getting in the way, if you've just kind of inverted the entire SDLC, frankly, and that's not just at AWS, multiple customers are seeing the same thing,

and that's why I had more customer conversations this year than in the last few years, simply because people are coming around and saying, "Yeah, we had a community-based platform, but what can we do more with serverless? Because our developers just want to go faster and faster, and they don't want to have all the liabilities of managing the code." Really, what I'm trying to say is that we are at an evolutionary time. We all feel it. I don't have to convince you of that. Whether what the final stage of this is, we don't know. I certainly don't know, and I'm not claiming that I know, but we are going through an evolutionary period.

One of the things that is a key part of our strategy at serverless is that we always focused on developers. Maybe we didn't focus enough on the developer experience, but we have, and that's a key pillar of our strategy so far. It was all about speed and evolution and creating evolutionary architectures, and I'll show you in the architecture what I mean by that. If we're living in a time, and we are, where things are changing rapidly, if your systems, your processes, your people are able to evolve, those are the ones that are going to be the most successful. The companies who actually are able to evolve and transition are going to be the most successful, so why not your architecture as well?

I didn't check how many of you were system architects, but I'm guessing a lot of you, a lot of the engineers, are systems designers as well. You all know that there's no such thing as the right architecture, right? It's only the right trade-offs with serverless. Yeah, you build highly evolutionary architectures, but just to kick off and get into the early part of the conversation, I'm not here to sell you anything either. I'll be as balanced as possible. Obviously, I'm very passionate about the space, but there are trade-offs. If you have a highly evolutionary system, it's very hard to manage change. It's very hard to observe what's broken because things are more decoupled and are changing very rapidly.

For some of the platform team owners, one of the things that's weighing on their mind is control. How do I govern? How do I make everything compliant while my developers want to move faster and faster and faster? So yeah, there are trade-offs that we have to manage. But the final thing that I always tell when we have this conversation is, look, evolution is not a new thing in tech. Just because generative AI has captured the hearts and minds of so much and it has been so disruptive for us, the engineering community here, speed always has mattered in business, and the faster you are, the more likely you are to win.

Serverless Strategy: Speed, Evolution, and the Hidden Infrastructure

So let's talk about serverless. I don't know if this is the biggest secret, and Janak believes it's the biggest secret in serverless, but look, serverless was always running. I operate a fleet of literally hundreds of thousands of servers, bare metal servers too, so those are extra fun to patch and scale. But what we've done is we've created a facade for developers where the developers just don't have to worry about the management of the servers, management of runtime, scaling, load balancing, request routing. A lot of those things are just taken care of by us directly.

While there's hundreds of thousands of servers in our fleet, from US East 1 all the way in Asia, and I think we have an African region now too, right? So there you go, all over the planet. But to you, or the developer, or to the engineering leaders who've built a serverless architecture, this is what it kind of looks like. In this picture here, I'm simply playing around with an idea of, "Hey, I want to build an agent or build an application that can guide users during national emergencies." So yeah, I'm talking to a lot of MCPs here. I'm talking to a bunch of different services that are available. I think I have FEMA here, et cetera.

What I mean by evolutionary here is that each feature, each facet of my idea is actually one of those horizontal lines you see that go through Lambda. I'm obviously using an AI, I'm using Bedrock to host my models. If I imagine, in this particular architecture, I wanted to add a new feature, I deliberately didn't put it on a slide. I wanted to add a notification feature where customers can sign up to a notification, and maybe over SMS or email I can email them notifications or SMS them. If they sign up, that would just be a branch. I'm not updating a bunch of microservices. That is just a branch.

One of the key things, and again I know the tools are heavily evolving, even in the last six months they're heavily evolving, one of the things that I've seen my engineering teams do is that smaller changes, smaller contexts, smaller files just are so much better, so much faster to iterate on, and the AI is far more right. Building those things, it is just like one-shotting. I mean, white coding, I think, really is a marketing term at this point, frankly. Guess what? All of these things are a single responsibility of Lambda functions and super easy to manage and update and super fast. Janak is going to show you some of these things when he comes and talks about it.

And look, at the end of the day, when you build that architecture, yeah, there were lots of parts to it. There's Infrastructure as Code that you had to probably use to actually create and manage the system. But once you have that set up, I like to talk about the "ilities," right?

And maybe you folks have a platform team where there are other humans who go and manage and scale servers for you. They focus on reliability and durability, but that's not actually true. Even in the platform team case, they don't know your software, so many things around security and scalability go into the software that the developers are writing too. And if you're choosing an underlying infrastructure technology, then you are responsible for those things. If you ever start thinking about your ideas going to global scale, you can see how hard it gets.

And look, at the heart of it, what customers love and why they build, sorry I went too fast, why customers love building with serverless is that those "ilities" that I had in the previous slide are all managed by us. That is kind of the heart of serverless at the end of the day. It's not about having those servers. It's about having no servers to manage and no infrastructure to manage, and then expressing your logic the fastest way possible.

Speed is the key thing. I mean, I already said that speed always wins, and that is our number one goal. Our number one goal has always been, how can we be the fastest to market? How can we take an idea that customers have and be the fastest way to do this? The funny anecdote I would share with you, many years ago, I think it was 2013 or 2014, after about a year of working on Auto Scaling, my boss asked me, "Hey, what's the charter for Auto Scaling?" This is before Lambda, this is before ECS and Kubernetes. I was like, "Hey, it's the fastest way to go take an idea and take it to scale, and let the best ideas win." Obviously we've come a long way from just using Auto Scaling VMs now, but this statement has at least been true for me as I've been in this space of helping people move faster and faster, because this is what matters. I think this is the thing that matters all the way from why customers adopt the cloud all the way to how they build cloud natively to move faster.

I wanted to share a really quick anecdote. This is a very recent anecdote as well from the last couple of months where CyberArk, which is a platform engineering team, basically built their entire platform on serverless. They were able to do the automation work that they needed to do and basically save something like four months out of the twelve months it took them to build new services. I mean, this quote itself is probably six months old. I would say this number is probably much, much lower now given the way the state of tools is. And again, once something is written, I mean, I think the final stage, I have a Tesla as well, but I unfortunately paid for Full Self-Driving for it, which it doesn't do after many years, because at the end of the day it's liabilities. The last "ility" is liability, and with serverless, more of the responsibility is on our side of the fence. At the end of the day, myself and engineering teams are the ones who are responsible for the scale and patching and security of your applications versus you doing so. I mean, there's still a shared responsibility, but a lot of it is on our side of the fence.

And look, it's not a new technology anymore, although as we get into the thing, I think what we've done over the last twelve months and launched at re:Invent really transforms this ten-year-old technology as well. But serverless is already everywhere. I'm obviously not going to go through all of these things. I just wanted to highlight to you how many big names have major applications, scale applications, running on Lambda today.

Building with AI: Introducing the Serverless MCP Server

So let's go. Enough talking about the context of why we're here. I think a lot of people are familiar with the feature, so I want to get into it and introduce my partner, Janak Agarwal, who will come introduce himself and walk us through some actual building things. Thank you. Am I audible? Am I audible now? All right, I'll thank Usman again.

Hello everyone, I'm Janak Agarwal. I used to be a developer for around eight years, so my perspective on serverless has really been informed by both sides of the equation of being a developer whose services need to run in production flawlessly, and now trying to build tools and services that developers can trust to run their critical workloads on. So I like to still think of myself as a good developer, but I'm probably not, but I am a product manager and I lead PM for Lambda.

Lambda has been around for a decade now. And as with any product or technology that manages to stick around for around a decade and more, there are some notions that build up, some preconceived notions, biases even, about what the technology can do, what it cannot do, what is it good for, what is it not good for, and so on. But then there are always some inflection points that come in, and I believe, I really believe, that serverless is at such an inflection point now. So what we're going to do next is we're going to have some fun. Over the next thirty minutes or so, we're going to build an application, and along the way, I'll show you some of the capabilities that we launched that allow you to now bring workloads to serverless that you could not before.

Here is what we're going to build. It's a note-taking application. It's going to have create, read, update, and delete as functionalities.

We're going to scale the application, and then we're going to build new features that our customers will want: encryption and decryption of notes, and analyzing the sentiment. Finally, for those who really like to write lengthy notes, research says attention spans are going down, so we're going to use AI to summarize our notes.

So let's move on to phase one. We're going to build our foundation now, the CRUD APIs. And I'm not going to show you my typing speed. What we are going to leverage here is vibe coding, as Usman was talking about. A key to vibe coding or successful vibe coding is a technology that we released earlier this year. It's the Serverless MCP Server. What it enables your favorite AI coding assistant to do is to convert your natural language prompts better into well-architected code, which is compliant to standards you can run in production very fast.

So let us get to it. We're going to do it in three phases. The first is we're going to install our MCP servers, and my personal preference is to also install the Doc MCP Server. I found it to be really useful in using new technologies which have been recently announced, and I love to hear, and I'm pretty sure all of us love to hear, that you are absolutely right. The Doc MCP Server reduces it. If you know, you know what I'm talking about, but I like to use it.

In the second phase, we're going to actually write code for our CRUD APIs. It's going to be a fully serverless architecture, so API Gateway for ingress, Lambda functions for CRUD operations. We're going to use DynamoDB tables for serverless databases, CloudWatch structured logging. Observability is really critical in serverless architectures. And I like to think that I'm still a developer, so I like my types. I'm going to be using TypeScript for this.

And then finally comes the build and deployment phase, and what we will see is that the MCP Server defaults to using this tool for build and deployment called SAM. SAM stands for Serverless Application Model. We've purpose-built it for serverless builds and deployments. It really simplifies your builds, your deployments, and also enables you to simplify your local testing phases.

Code Generation to Deployment: Best Practices Built In

So what I've done is I installed the Serverless MCP Server, and here's a picture that shows you the tools. At the time when I captured the picture, around about a week before re:Invent, we had 25 tools. These tools are now available for the AI coding assistant. Some of the critical tools are, it gets guidance in what workloads are good for Lambda, what are not good for Lambda, how to build and deploy web apps, how to build and deploy event-driven architectures, including your Kinesis and Kafka, and so on. It also knows how to get metrics for CloudWatch. You have to know which metric, where to get it from, and so on. It helps to fine-tune that for you and also assist.

Next, we're going to write the code. So I threw in the prompt that we saw on the previous slide, and in around about five to ten seconds, Quiro, my AI assistant, is going to tell me that the code files are already done, and the next step is to build and deploy. So I've magnified some images here for us to examine what it did under the hood. You see the project structure that Quiro or my AI coding assistant made with the help of the MCP Server. You see the template.yaml file. That is the Infrastructure as Code assistance that you get out of the box. The package.json enables you to simplify dependency management for your application. The TypeScript file is obviously the business logic, and the tsconfig helps you to simplify the build or knows the build steps to run as you transpile the TypeScript to JavaScript. And then there's some error handling which is now automatically built in.

So without the MCP Server, this was not built in. I don't have a before picture, but the after picture is here. You see there's sufficient input validation. We're doing three retries when it comes to writing to DynamoDB. Good practices. Here are some additional best practices built in. You see the global error handler here. We have consistent HTTP status codes, structured logging with CloudWatch, all available from the get-go in the version zero of the code that was written automatically. So the code generation process is much more compliant with the Well-Architected Framework with this.

The next step for us is to build and deploy. I've provided the command to build and deploy to my assistant. What it does is SAM will take over and use the YAML files, the TypeScript config files, and the package files to run the deployment for us. It will enforce best practices along the way. We'll magnify this image in a bit. It will show us the CloudFormation change stack, or the change set, that shows the delta of all of the work that we're going to deploy to AWS. It will upload the code to S3. In a second, hopefully, that is complete. Here's the change set, and then as we go to the console, we will see that our functions then begin to light up.

The console should load any moment now. There you go. Here are all of the five functions that we wrote, and they're right there available for us to serve traffic to. Here are some magnified images. While building, SAM detected that our APIs had no authentication. It was asking me to confirm if this is really what I want to do, and since this was a demo, I chose to go yes. Here's a snapshot of the change set that it is walking us through. It is adding all of these files, these rules and permissions for databases and functions and so on. Your build and deployment steps are also much simpler with SAM.

If you think about it, I don't know if I spoke more than five to seven minutes, but in around about five to seven minutes, we have a fully working backend and the cloud operations. The cloud APIs are deployed fully using natural language with the Well-Architected Framework baked in. When we talk to customers, they do tell us that generative AI has actually sped up the code generation process, but it's not resulting in shipping the actual software faster. That's because the ship cycle has much more process baked into it. There's Infrastructure as Code, there's code reviews, and so on, and all of that is not really becoming faster with generative AI. That is where serverless has tried to innovate right across the stack.

We enforce best practices while you generate the code. We just saw that during build steps and also during deployment steps. The key takeaways I want us to focus on here are that the MCP server helps you to generate code with best practices baked in. I work with a lot of developers, and their productivity is high, but they are now actually able to ship software also faster because the Infrastructure as Code is baked in. What they love the most is the code that is produced is of a consistent quality because a lot of customers also complain that some people just write code and then send out a code review without making sense of what it is. It helps you to minimize those problems by producing code of consistent quality.

Hands-Free Scaling: Handling Needlepoint Traffic at Speed

Moving on, let's say our application was picked up by some news media outlet. What that has resulted in is a bunch of concurrent traffic right off the bat. The way customers tell us they handle this scenario is they overprovision capacity. When I say overprovision, meaning they provision for peak with the understanding that at some point in time, they'll go back in as a developer of that application and try to optimize their costs by applying the right scaling policies. But this provisioning for peak leads to higher costs and also a bunch of human-led maintenance. Tomorrow never comes. The scaling policies, you're constantly having to optimize, but you find ways to just build features instead of focusing on optimizations.

How do we handle that with serverless? With serverless, we give you hands-free scaling options. Our scaling rate is pretty much fastest amongst all compute choices that you have. We give you one thousand execution environments every ten seconds. If you think about it, if your function's execution time is one hundred milliseconds, what you're really getting is ten thousand requests per second every ten seconds, right off the bat without lifting a finger. Let's run a load test. I've designed a very basic custom load test. I'm around about a minute in here, and you see that I've literally increased the traffic by seven hundred to eight hundred times in like twenty-five to thirty seconds. There is not a single error, there's not one throttle. Lambda is able to just absorb it right off the bat, and you didn't have to lift a single finger to do so.

This is the power that Serverless gives you. This is the kind of workload that Serverless really shines in, leading to a lot of happy new users and end users, and all of this without any idle costs. So the key takeaways are that our scaling rate is really fast, the fastest amongst all compute. Let's call this type of traffic "needlepoint traffic." Imagine that you're building an application where hundreds of thousands of spectators in the stadium have to scan a QR code whenever a goal is scored. You can seamlessly handle that. Or there's a flash sale, and I see a bunch of people wearing shoes of that company, and a new shoe drops, a new flash sale starts, you can just handle that seamlessly.

Testing is quite simple because you're literally only testing your application logic. You're not testing scaling at all, it just works. Lambda provides it to you out of the box. Multiple pieces of the functionality in our application, so you saw we had four or five APIs that we built, all of them scale at the same rate independently of each other without affecting scaling rates of anything else. So the noisy neighbor problem that you have is sort of eliminated. And this is all without managing any infrastructure. Right, so far, so good. Let's move on.

Shifting Workload Profiles: The Need for More Compute Options

So we're good people, we listen to our customers, and based on the customer feedback, we are now building this new feature called encrypting, decrypting notes and analyzing its sentiment. So if you think about it carefully as a developer, the profile of your workload here is shifting. It's no longer just CRUD APIs, it is a more CPU intensive workload. Because we listen to our customers, the feature also achieves popularity. What I mean by that is the scale to zero aspect is no longer super important for you. There's always some traffic to serve, always some users to service, so there's always, in other words, some steady-state traffic. And I'm loosely defining steady-state here as with a peak to main traffic ratio of around two. How do we handle that with serverless? Well, it was hard.

So when we talk to customers, this is what they tell us. In this phase of the application, they really want to drive optimizations. They want to optimize costs, they want to optimize performance, probably by leveraging the latest in compute, memory, network intensive instances and so on. And they want to do all of that with a familiar developer experience that they currently use, they like, they love with all of the integrations and with fully serverless operations. In other words, they don't want to remove the focus away from core business logic, they want to continue to leverage the practices that they use today but get more choices.

So what did they do before this week? We saw that they would just focus on rearchitecting the solution away from serverless, away from Lambda. And that was an incredibly inefficient use of engineering time. It dilutes the focus away from business logic, results in an increase in ongoing maintenance costs in perpetuity, and again, incredibly inefficient. So we wanted to design a better way to support such scenarios in serverless on Lambda.

We're delighted to introduce to you Lambda Managed Instances. So the mental model behind this feature is that we want to give you all of Lambda's simplicity in developer experience and integrations and tooling, and marry it with EC2's specificity and flexibility of choices in compute and network and memory that you get. So with Lambda Managed Instances you get access to the latest instance families like Graviton 4, or probably Graviton 5 as soon as it comes out. You're going to get the latest generation in instances in memory compute, network optimized.

All of this continues to be fully managed. As Usman was saying earlier, serverless is not the absence of servers, we just manage the servers for you. Here we're giving you those options, but we continue to manage the servers for you, so we're going to continue to scale them, patch them, route requests to them, the whole thing. You continue to get the same extensive event source integrations that you used to get with Lambda and you still get with Lambda. And with Lambda Managed Instances we're also adding a new feature called multi-concurrency, or the ability to serve multiple requests from the same execution environment, which has been a long-standing customer ask. And when you combine this feature with EC2's pricing incentives of Savings Plans, Reserved Instances, and so on, your cost really, really optimizes.

And we'll see in a second how.

Lambda Managed Instances: Marrying Simplicity with EC2 Flexibility

So using Lambda Managed Instances is as easy as just creating a capacity provider. When you create a capacity provider, you have the option to specify any choice you want, and again, I focus on the word option because this is really an optional setting that you specify. Any instance types, any scaling policies, you know, max and min, all of that is configurable, but optionally. You can just be hands-free and let Lambda and AWS take care of it. Once you do that, you will create your function the way you do today. You'll just configure it to a capacity provider. And then Lambda will scale it, patch it, run it, provision the instances. It will pick the suitable instances for your workload and drive this continuous optimization loop for the utilization.

Now, adding servers to serverless, you know, it's a tricky thing to sort of get right. So we did try to do deep research here to make sure that the experience is as simple as it possibly could. Let us see how. So, I'm on the Lambda console now, and I head to capacity providers, the new feature that lights up.

I'll create the capacity provider, give it a name, give it a VPC, subnet, security groups, and an operator role that has access to manipulate my EC2, and then I head over to the advanced settings where the action really is. So here you can choose your architectures, you know, from Graviton or x86. You can include or exclude certain instance types or, you know, just let Lambda pick for you. You can apply scaling policies, you know, a max to cap your costs, or a min to always have some pre-warmed instances available to serve your traffic. You can tag them for tracking purposes, and so on.

And when you head back here in a second, you'll see that the capacity provider is now active. Now, at this point, you know, there is no EC2 instance that has started because it's just the capacity provider construct that you've created. There's no charge for creating this capacity provider. The next thing we wanted to get right was the same developer experience that you have with Lambda today. Let's see how we did. So, here's the updated create function flow. You can see that it is the exact same, but there's just one new additional parameter, which is the capacity provider config.

And along with here, I wanted to highlight two additional features. One is the multi-concurrency support that we've discussed a second back. And then there's also the ability to customize the memory to CPU ratio. Just like EC2 instances, you can now choose your memory to CPU ratio on Lambda to conform to the compute-intensive or memory-intensive or, you know, general-purpose instances. So you can imagine the classes of new workloads that you can now run on Lambda, which you could not before.

So what we'll do next is I'll throw in that create config command to my assistant. And I'll then create my function, I'll go back in there, and when I see the configuration tab, there it is, it's configured to the capacity provider. I'll go in, change the memory settings to 4 GB for the function for demo purposes, and I'll go back into that other setting in a second, and I will shift the multi-concurrent setting to 64 concurrent requests from the same execution environment, change the memory to vCPU to 4 to 1. And when I hit save, and then head back to the capacity provider, I'll see that the function is now active.

And it is at this point that my EC2 instances will be created. Here's the EC2 console. And if you notice carefully, you know, when I provided subnets initially while creating the capacity provider, I provided it in 3 Availability Zones. So my EC2 instances are also across 3 Availability Zones now. So my instances and by extension, my application is also now finally AZ balanced. And at this point, once the function is active, the EC2 instances are spun up, your execution environments are pre-warmed with multi-concurrency support enabled, ready to serve your traffic.

So let's start a load test with our synthetic workload. And I'm around about 13 minutes into the load test. I enhanced the load test tool for this. You see, the traffic is much more steady-state. It's still increasing, but still steady-state in our definition here. And we see that the scale up from 3 is now to 7. We've achieved a utilization of around 25%. The utilization of memory and vCPU and the instances, the underlying EC2 instances, utilization is still pretty healthy, you know, between 15 to 35%.

And remember, the higher the utilization, the more the cost optimization for you. So really, the key aspect here is the utilization is baked in. And now around an hour into the load test, you see the traffic is still steady state, increasing but steady state. There's 932 throttles to be exact, but again, throttles are good. We can handle it in our code by way of retries or queuing or so on, but there's not a single error. The error rate is still zero. We've scaled up to 21 instances now, and if you observe carefully, the scale up and scale down has both been triggered. It sort of tracks the CPU utilization of the capacity provider, and we're still at around 25% utilization, 22.5% in this case to be exact.

So I let the load tests run for around 90 minutes. And here's a key result I wanted to highlight. So when we talk to customers, a bunch of customers tell us that they take this technical debt when they provision for peak, when the workload profile shifts. And when they take this technical debt, they don't apply the scaling policies properly or they don't spend time in optimizing it. They achieve low utilization rates. So at around about 6% utilization rate, my synthetic workload, the load test would have cost me $8.50. But if I improve it to 9% manually, it would be $5.67. But with Lambda Managed Instances or LMI, the auto-scaling is actually built in. You actively have to go in and choose to turn the auto scaling off, so from the get-go, you're optimized. And at the lower end of the scale we're able to achieve around about 25% optimization off the bat.

This is literally a 60 to 90 minute test, and in 130 minutes itself, you could see earlier that the utilization had reached 28.5%. So with that utilization, my costs just for the EC2 would have been $2.05. Now, on top of this, Lambda Managed Instances applies a management fee of around 15% for EC2 instances. Now, in that 15%, think about what you're getting. It's automatic scaling, provisioning, patching, continuous optimizations based on your workload profile shifting. There is literally no need for you to rearchitect the solution away from Lambda. So you're saving all of that time. You get to continue to use the same CI/CD pipelines, the same observability tool set, and the same integrations are all available.

So, some key takeaways here. With Lambda Managed Instances, the mental model again is to give you the simplicity of Lambda and serverless with the specificity and the flexibility that EC2 gives to you. And it is simple to maintain. Out of, let's say, the five functions that make up your application, four require scale to zero, you can leave them where they are. The one that has achieved steady state or has a different workload profile, CPU intensive, you can move to Lambda Managed Instances. We continuously optimize the utilization for you to give you the benefit of costs. There is zero infrastructure management here, and with the new functionality of configuring CPU to memory ratio, you're able to bring in much more workloads that you were not able to run on serverless very easily.

And the managed instances, it just works. Usman was talking about the ilities, as he calls it earlier, the reliability, patching, scalability, availability, all of that. AWS and Lambda continues to be responsible for, and all of this with the same developer experience that you get with Lambda today without having to rearchitect anything. Next, we're gonna build a workflow-based architecture. I'm gonna request Usman to come in, show how to build that, and also talk a little bit about our strategy. So, thank you very much.

Addressing Serverless Trade-offs: Cost, Complexity, and Long-Running Jobs

All right, thank you Janak. I think LMI does deserve a clap. It was really, really cool. So a couple of cool stories. I didn't do this in my rehearsal, so he would be mad at me. The engineers were actually really mad at his demo. I don't know if you guys saw what he was doing in this demo. He was basically creating these spiky traffic, 1000 TPS for one second only. And for those of you who operate distributed systems and scale distributed systems, you know this is pretty much the worst workload. So the engineers are like, we talk about synthetic workload. This is as synthetic as it gets. But again you can see the results of how the system scales and works. It was a fun few weeks getting it out the door.

And the second thing I wanted to connect with you just to kind of connect to what I was speaking to you folks at the beginning. I talked about trade-offs. I talked about observability. I talked about control. I talked about evolutionary architectures and how you have to use Infrastructure as Code.

Let's talk about where we are with the trade-offs now with what Janak just talked about, and then I'm going to show you a couple more things here that we just recently launched. This was launched on Tuesday, what I'm about to show you. So one of the things in the picture, for those of you who remember the picture I showed about that FEMA emergency response application, what was the challenging thing about it? Your application now is highly evolutionary, single responsibility. You don't have monoliths. You're using Infrastructure as Code. And then what Janak showed you with the power of AI actually generating SAM with the best serverless practices pulled in is taking that trade-off away.

One of the hard things, I mean for me personally as a developer, I've shared with you folks, I used to be a video game programmer, a hardcore C++ developer. Writing YAML was always hard. It's still always hard. I don't know why I can't do it, but now I don't have to do it. I actually get really awesome results from our MCP Server with the best practices built in. So that's one problem taken away from developers and from infrastructure developers.

Second problem, and I'll be again very direct and forthright with you, people used to say, hey, Lambda is too expensive at scale. If my idea ever got big, I will hate it or my boss will hate it if I'm an engineer because the costs go high. And as Janak showed you with Lambda Managed Instances, how we're able to deliver incredible usage. Look folks, I shared with you that I used to run Auto Scaling. I've created hundreds of Auto Scaling groups in my time at AWS. I don't think I've ever managed to run code on those Auto Scaling groups simply because I was just testing the service. This is where I can go from infrastructure to code to highly utilized code literally in a couple of minutes, and that is the most incredible thing about Lambda Managed Instances.

So almost two fundamental things about serverless, which was like, hey, it's expensive at scale, or if I have to, if my idea gets big, I have to rearchitect, or I have to deal with Infrastructure as Code and that's complex. We've actually tackled those things really well this year. Now let's talk about a third thing, which is, hey, Lambda doesn't run long running workflows or rather long running jobs, and if I ever have this long running issue, I'll hit a 15 minute timeout and again I really hate my life.

Lambda Durable Functions: Reliable Workflows Without the Complexity

So let's talk about workflows now. And look folks, I've been with the workflow services for a long time. Auto Scaling was one of the, is still one of the largest users of Simple Workflow, so a little bit behind the scenes. I own Simple Workflow as well. I'm the engineering leader for Simple Workflow too. One of the things we talked about, we've been talking to customers, at least I've been talking to customers about workflows for 12 years. Developers just didn't get it unless you worked at Amazon, obviously, because then we really get it. Internally we know when you want to do reliable distributed systems you need a workflow. The thing that has really brought workflows back front and center, now everyone wants to talk about workflows, obviously is AI agents or AI-based workflows. So obviously there's a ton of interest in this.

But look, this is the system we're building, and why we want to do this is because orchestration is important in the type of applications, the new type of applications customers are building, right? So in this particular case I'm talking about an enhancement to the thing that Janak kicked off where I want to be able to summarize the note from our note taking application. So these are some of the steps, right? You'll have to retrieve the note from the storage space. I think in my example it's DynamoDB. You need to have, I mean, here's the thing with LLMs, they are asynchronous. You are waiting for a response from them, and if you want to scale your LLM, you have to make it more asynchronous versus everything is synchronous around it. You generate the summary and you need to store the result. Those are your steps that quite literally map into a workflow, right?

And look, if you were to write code today, you can run this code on anything, EC2, any compute, we can run this code on Lambda as it is right now. You are now responsible for the reliability of all these steps. If you're not using a workflow system, you are responsible for figuring out, okay, when to handle retries, when to roll back things if things have gone incorrect, and you can see, you know, I have manual checkpointing here somewhere. I don't have the line numbers unfortunately, and I'm doing a bunch of sleeps and waits. And while you're waiting, especially if you're waiting on Lambda, you're paying for the compute, right?

Okay, so what we heard from developers basically was, look, I don't want to write code this way, even if it's simple. AI can write this code for me really simply. I don't want to write this code myself and make it figure out how to do it reliably. Obviously I want to still just write code and I want to use my tools and IDE. Pausing and resuming is a really powerful step simply because if you look at the most powerful use of AI today, it's basically AI code generation and guess who, where the human is in the loop? It's you guys, it's the developers. And so it's super powerful to be able to pause a workflow and then resume it. And finally I want to use my favorite programming languages.

And so look, we heard you and now we're introducing Lambda Durable Functions, or Matt already introduced it. And so with Lambda Durable Functions what you do is you just simply write code. You write simple sequential code. Well, all code is sequential.

Reliability is baked in. Reliability, retries, your workflow semantics are just baked in. Right now we support Node and Python. More languages are coming. As I said, the team pushed super hard to get this out for re:Invent. More language support obviously is coming, but Python and Node are super popular with Lambda, and we covered both the most common languages there.

Yes, you can suspend and resume long running operations, and while you're suspending the operation, you're not paying for Lambda at all. Finally, it's the beauty of managing the realities. It's still Lambda. Yes, some of you might have heard of durable executions or workflows before. What's super unique is that this is a compute service, Lambda, with that reliable workflow stuff built right in. There's nothing else. You're simply going to write a Lambda function, and I'm going to show you what that looks like.

So in a durable, what durable functions really do, how they work behind the scenes, is they come with a very simple SDK. If you choose to, and I'll show you how to do it, if you choose a durable function in the Lambda console, the SDK is loaded as part of the runtime for you. They have the ability to checkpoint. You decide when to checkpoint. By the way, this whole system is built on top of the same underlying system that powers Step Functions as well. For those of you who are familiar with Step Functions, a lot of these things will make a lot of sense to you immediately.

You can checkpoint, and in Step Functions, every state is checkpointed. In this particular case, you're writing the code and you decide when to checkpoint and then replay. The idea behind replay is if you're waiting or you're resuming your function or there's a failure and then you want to retry and replay from failure, because that's where reliability comes in, you do not have to replay all the stuff that was already checkpointed. You simply get those results back, so you're not ever wasting compute. Tying it back to what I was saying about long running workflows, we really thought long and hard about Lambda Managed Instances as well, and we said, hey, should we allow customers, I mean they're already paying for the full EC2 instance, should we just allow them to run Lambda functions for hours. But the whole point of doing that is that you're building inherently unreliable software if you're going to do that. So we said, hey, can we do something better? And that's how we got together with the Step Functions teams, and the teams collaborated together to build a joint capability inside Lambda.

So look, these are the things that make something durable execution. Getting started is simple. You choose a function name, but you're providing an optional durable configuration. That's how you turn it on. The execution timeout and the retention periods have defaults, so all you're doing is basically saying I want this function to be durably configured. And look, I'm going to actually show you what a running function will look like.

So I've already created one of these functions. What they come with is they have a new tab on the console called Durable Executions. And again, for those of you familiar with Step Functions, you'll be used to the observability of the workflow. We've taken a lot of things. So I'm going to kick off my durable function. Don't worry, I'll walk you through the code in a second. And you'll see that there's a new execution that has started, and we'll click into that in a second. You'll see that the steps are described where the system and the workflow has already fetched the note. It's starting to process it. It's starting to then send it to the LLM for a summarization, and you can actually kind of see the progress. I mean, in my demo here, obviously there's no failures, but if anything was being retried, you'll be able to quickly pinpoint where your asynchronous task is not behaving the right way. And look, there you go, everything is done.

So these workflows can run up to one year. Obviously the most canonical use cases are short lived transactions which are probably sometimes under one second, sometimes a few minutes, but you absolutely can build human in the loop systems here as well, which can last a long time. So let's take a look at what the actual code was. Let's zoom in. So the first thing I wanted to highlight, again for many of the developers who are familiar with Lambda, you'll see that Lambda has a context object. When you create a durable function, you get a durable context object, and the durable context object has those things I talked to you about, which enable you to do waiting or checkpointing, which is a step, and it has structured logging as well for observability to see where you are while the workflow is running.

The durability part is again baked in, so I'm going to actually show you this is the step for actually retrieving the note from DynamoDB in this particular case. So the code you're writing is basically the step, which comes from the context object. The DynamoDB call, we go to get item command right there, and then you have that structured logging to actually, in the console or any of your observability tools where you're sending your logs to from your Lambda functions, you can actually see, hey, what was actually done right now.

One of the coolest things about Step Functions or workflows in general is idempotency. Idempotency is super important when you do transactions. Idempotency enables you to basically have one workflow, one unique workflow of a type, and not allow a second version or instance of the workflow to get started. So that's through the name of the durable context.

When you start the durable function and you give it a name or an ID when it gets started, you're able to maintain that consistency so no two objects are working on the same thing. This is a super powerful use case for a ton of applications.

And finally, I wanted to highlight wait. Waiting is as simple as just calling context.wait. And while you're waiting, we shut down the execution environment so you're no longer paying for it. Once whatever you're waiting for either resumes, I use a very simple code which is basically a wait. You can actually do a callback with condition where you are able to wait for some condition to be true, and then the execution environment wakes up. All of these things basically allow you to save time while you're waiting, and you're not paying for any compute and you're not paying for any resources at that time. And then the workflow starts from exactly that same step after the wait is complete.

And look, in the previous example there was a bunch of steps I gave you. I kind of quickly walked through the code. What would it take to actually build such a workflow? But for the developers here and the engineering managers who review the architecture for the developers, you can imagine what would it take in a traditional architecture to build something like this. You're using queues, you're using compute all over the place at different steps between the queues because now you're not bundling things properly. Your deployments are more complex, debugging is more complex. If anything goes wrong, replayability, rehydration, all of the basic challenges of a distributed event-driven architecture come in. But with a simple orchestration, a simple deployment, a simple SDLC, you're able to basically build really reliable workflows. And again, as I said, there is no real comparison to this technology out there. There's lots of workflow technologies out there. This is actually a compute technology with workflows built into it.

And so look, we've got long-running Lambda functions now that can run up to a year. Anyway, I covered these points already, and so we can get going.

Tenant Isolation: Securing Multi-Tenant SaaS Applications

All right, so we talked about our MCP server and what we're doing with Gen AI development. There was a whole bunch of things we've done. How many of you, by the way, have installed our actual dev tools for VS Code? Few people, awesome. Look, I would highly recommend for the developers here to go try to check out our developer tools from the AWS Toolkit. I'm assuming most of you use VS Code. One of the things you might have missed, for example, is you're able to remotely debug Lambda functions. You can actually put a breakpoint into a running Lambda function through our tools as well. So we really are focusing on developer experience. But look, Janak already talked about MCP and how that makes developer experience super easy, especially on IaC and our best practices. We talked about LMI, which lets you run long-lived, steady-state, large workloads at incredible discounts and choice of EC2 instances. And I talked about durable functions for you folks as well, where you can now run really reliable workloads because reliability and durability is super key for long-running workloads. It's not a question of if you'll have an infrastructure issue, it's when you'll have an infrastructure issue. But we're not done. I want to talk to you about one more thing we launched a week or so before re:Invent. This is really around especially security-sensitive SaaS applications.

So here's a scenario. You have three customers or let's say three tenants of your SaaS, and you're using Lambda. Lambda has, you know, maybe you're using a global variable in the Lambda function. Maybe you have something in the temp file, you're using our temp storage. Maybe you want to restrict and you want to say, hey, an execution or invocation to this Lambda function should only talk to the DynamoDB table and only fetch data from the DynamoDB table row for this customer who's calling or invoking this function. So in this case, I have a blue tenant and they created a Lambda function. We created an execution environment for them, and maybe there's something left over, maybe there's some global variables left over. And then the yellow tenant calls. But the problem with the yellow tenant is they might have some side effects that they leave in the Lambda function as well, or you're not able to isolate those side effects if you've written code this way.

And finally, of course, you have a green tenant, and again the request is being sent to maybe the same execution environment and there's some side effects. And look, this is nothing Lambda-specific here, right? This is true for any compute you run, EC2 instances, containers. One of the hardest challenges is how do I isolate my customers from each other without actually creating individual infrastructure for each customer? It's something pretty expensive to do. You can do it. The way you do it on Lambda traditionally before this feature is you would create a Lambda function per customer. I know it sounds ridiculous, but there's customers and use cases where you have to do that. Well, now you don't. A week or so ago, we launched tenant isolation, and what that does is in the invocation, so it's still the same Lambda function you're writing, you pass a tenant ID.

And then what we do is we create individual execution environments which are not shared by any of the tenants for you. The idea behind this is that if you have sensitive software, if you have AI-generated software, if you're in an environment where you really want to even isolate your own customers from your infrastructure, now you can do that and it's super simple. There's no extra infrastructure to manage, not even extra Lambda functions to manage.

Looking Ahead: The Serverless Roadmap and Developer-First Strategy

Okay, so we're almost through, and I know we have a little bit of time, so I'd love to invite Janak up as well and take some of your questions. But I wanted to share with you what our strategy is because we're not done yet. I always like leaking a few things, and we've got some big things done by re:Invent. I suspect in the next six months we have some other big things we're working on that didn't quite make it, so keep your ears and eyes peeled for more information. But all those things will be in the same vein. Lambda has always been about developers. Lambda has always been about speed, and we really embrace that idea.

What we're trying to do is we're trying to get the objections out of the way, like objections from platform teams that it can get too expensive, objections that ISC is really hard to get right, and the fundamentals like, hey, can I really do cost control with Lambda really well? All of those things are going to be built in, and we're really laser-focusing on developers and how developers can move fast with services. With this, over the last 18 months I talked about remote debugging and some of the stuff we didn't even show you. We are heavily involved in the dev tool space to make sure that serverless development, you don't have to be a serverless developer, you just have to be a developer to get the benefits out of it.

Alright, so just to recap the whole thing again, we have a mirror roadmap that you can see that's focusing on developer experience, and then the fundamentals are there. What's coming up next is observability. OpenTelemetry support, we've heard from customers, we wanted to have native OpenTelemetry support. We've already done a bunch of launches over the last few years around structured logging and getting Lambda to be ready for OpenTelemetry, but we want to have full OpenTelemetry support for customers. That's a key part. It's a key trade-off that I mentioned in an evolutionary architecture, so we want to continue to make our observability better.

More runtimes. Again, you might have missed this, but we just launched Rust support in Lambda as well now. That was, I think, a week or so ago as well, but more languages, more frameworks, more runtimes are coming. Look, with LMI, with Durable Functions, we are able to now open up a whole new class of applications that you just couldn't use on Lambda, but we're not done. There's so much more we want to do because we believe customers shouldn't have to choose to manage a lot of those ilities that I was talking about earlier simply because they have a business need that doesn't fit.

And look, integrations are bread and butter. The way Lambda delivers that speed is that so many things from EventBridge to SNS to API Gateway to ALBs to SQS to Kafka, they're all just built in. You're not trying to figure out how do I make this technology work with my code. They're all just built in, and it's our responsibility to make them work. We're going to continue to do more both on the dev tool side and the integration side so that your favorite CICD tools, your favorite observability tools just work with Lambda.

And look, you might also not know this, but we just recently, again right before re:Invent, so Janak and the product team were quite busy, we launched our roadmap as well publicly. So if you were to just take a look at the QR code, that would take you to the roadmap as well. Give us feedback. We love to hear from you. I'm going to invite Janak over. Janak, we have a few more minutes. Thank you so much, folks, for sticking around and coming out on Thursday. If there are some questions, we'd love to take them. Thank you. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community