Kazuya

Posted on Dec 6, 2025

AWS re:Invent 2025 - Building for Efficiency & Reliability with Performance Testing on AWS (CMP351)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Building for Efficiency & Reliability with Performance Testing on AWS (CMP351)

In this video, Luis Guirigay, AWS head of infrastructure solutions, explains how to build reliable and efficient applications through performance testing. He distinguishes resiliency (recovery from crashes), reliability (preventing failures), and efficiency (meeting SLOs with minimal resources). He covers six types of performance testing: load testing, stress testing, endurance testing, scalability testing, spike testing, and volume testing. Key metrics include percentiles, transactions per second, latency, and errors. He demonstrates AWS's Distributed Load Testing solution version 4.0.1, which supports JMeter, K6, and Locust frameworks, enabling global multi-region testing that automatically provisions and cleans up infrastructure. The solution deploys via CloudFormation in 5 minutes and can simulate millions of users across multiple regions simultaneously.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Understanding Resiliency, Reliability, and Efficiency Through Performance Testing

Perfect. So before I start, how many of you are developers? OK, cloud architects. Let's say responsible for any type of application or infrastructure. OK, quite a few, so perfect. My name is Luis Guirigay. That's why he didn't want to pronounce it. He didn't know how. So I'm based out of Miami and my role is I'm the head of infrastructure solutions within AWS. The idea of today's session is to understand how you can build applications and systems that are reliable and efficient.

So for that, and this is going to be important. Number one, there are three very important concepts that most people, especially the first two, tend to combine or mix: resiliency, reliability, and efficiency. So resiliency is the ability to recover from a crash. That's simple. If you have a problem, if something crashes, the way you recover is your resiliency strategy. It could be seamless, it could be five minutes later. It's up to you, but that's resiliency. The next one is reliability. Reliability is everything you can do to prevent a failure. So the more you spend on reliability, the less you have to worry about resiliency. That's simple.

And the last one is efficiency. The concept of efficiency is when you're building something, that thing is supposed to do what it needs to do, consuming the least amount of resources but still meeting your SLOs, your service level objectives. Now, the way to achieve the last two is through performance testing. So what is performance testing? By the way, according to Gen AI, that's me. Same shirt, just different shoes. Think about me as the application. Think about the car as the workload, the number of users, the number of transactions, whatever. That picture was taken in a very specific moment in time, but what happens if I put another car on top of me? What happens if I try to walk? What happens if I get to walk for how long or how fast?

For how long can I hold that car? All those questions are things you have to answer in the context of performance testing. So it's the ability of understanding how your system or your application will behave under different load conditions. That is performance testing. Most people when they talk about performance testing, they only say load testing, when in reality, load testing is just one type of performance testing you should be focusing on. Let's say that you are a company with a million customers. One million customers is the expected load. That's why you should always be prepared no matter what. So that's why I emphasize the word expected. The next one is stress testing. There are two ways to approach stress testing.

One is, I want to find my performance degradation point or my breaking points. Because especially if you're doing end-to-end testing and you have a lot of endpoints in your scenario, different endpoints might have different performance degradation points. For example, your database may be able to support more requests than your HTTP server and your API. One might be able to do more than API number two. So all of that. The other way to approach stress testing is based on my expected load, which let's say is a million, can I go two times, five times, ten times? So stress testing is all about finding those limits.

The next one is endurance testing or soak testing. How many of you are Java developers or were Java developers? OK, quite a few. So in development or in general, if you are going to have a memory leak, that memory leak will not happen in the first two minutes. It will happen after an extended period of runtime. So that's why it's very important that when you're doing performance testing, you need to understand not only if you want to hit capacity, but for how long can you sustain that particular load because especially if you're running a massive event and that massive event happens to last three hours, you need to be prepared for at least that duration.

The next type of performance testing is scalability, which means if your goal is to support a million users, you want to ensure that you can scale from zero to a million users. This typically happens over a few hours or maybe just a few minutes.

Now, the next one I really like for two reasons. Most people think about scalability only in one direction, but what happens if you're running a promotion that starts at 9:00 a.m.? At 8:55, you might have zero people or maybe 1,000 people on your website. But at 9:00, it's going to go from zero to a million in less than a minute. This is especially important if you're using auto scaling technologies like Kubernetes, ECS, or auto scaling groups. You need to ensure that your system can auto scale fast enough based on the demand.

Here's another important aspect of spike testing. Many people only measure the load on the way up, but what about on the way down? Those sudden changes are very important to analyze and understand. When you go from zero to a million in a minute, you also need to handle going from a million down to 500,000 in a minute. You want to make sure that your system can downscale appropriately and that your application itself can handle that reduction in workload.

The last type is volume testing. Volume testing is all about understanding how your system behaves when handling a massive amount of data being ingested or read. You can apply this to your database, file uploads, and so on. It's important to note that it's not one or the other. You can combine all these tests based on what you're trying to achieve. You might want to do one or the other, or multiple types together.

Why, When, and What to Measure in Performance Testing

Here's a cool tip: many people do this only once before they launch. In reality, you should be doing load testing on a regular basis. So, why performance testing? Number one, you can scale and deploy with confidence, no more guessing. We have an impressive amount of compute offerings and database offerings. The easiest way to know which one is best for your particular workload is to test it. That's how we do performance testing.

The other benefit is reliability and efficiency. The more you spend doing performance testing and focus on your reliability strategy, the less you have to worry about resilient scenarios. Yes, you cannot prevent a flood, fire, or earthquake. There are major events that you will always have to be prepared for because there is no way to prevent them. However, there are different opinions on what the exact number is, but it's a significant two-digit number—the amount of outages and failures that could have been prevented by just doing performance testing.

Better experiences are another key benefit. Whether you are in e-commerce or providing infrastructure or an application for your internal employees, it doesn't matter. You want to make sure that your site, application, or API responds in 3 seconds or less. Otherwise, you are losing that client or customer. The last benefit is cost optimization. If you're developing or launching a site, you might go small, medium, large, and so on. You do a small environment and get a 5-second response time. Then you try medium and get a 2-second response time. Perfect. What do you do next? You go large and get a 1-second response time. Awesome. What's next? Let's go extra large. You're still getting a 2-second response time. Let me go to 2XL. You're still getting a 2-second response time.

At that point, what's the sweet spot? The sweet spot is large. Why? Because you're still getting that 1-second response time that you're going after, but you're not oversizing your system.

When you should be doing performance testing, I usually tell this joke: if you are in IT and you are responsible for either code, an application, or infrastructure, and you are breathing, you should be doing performance testing. Whether it's before a major event, through unit testing because as a developer you want to make sure you test it before you pass that milestone, or through infrastructure changes—how many of you are familiar with Graviton, for example? Quite a few, right? So if you're running on x86 and you want to see if you're going to get any performance gains by switching to Graviton, that's how you do it through performance testing. And once again, do it early, do it after, and do it always.

If you have an application that on a daily basis you want to make sure it's behaving the way it's supposed to, set a baseline and do a load test every morning or every day. Identify what's the baseline, what's the sweet spot. That becomes your baseline, and then on a daily basis run it. You want to measure and know when that application behaved, let's say, five percent slower than usual. You want to investigate what happened and why that particular day performance is five percent slower.

Now let's say that tomorrow it's ten percent faster. Do you think you should investigate? Exactly right, you should, because most likely something was supposed to do something and it didn't do it, and that's why it ran faster that particular day. That's why having a baseline and doing daily or often performance testing is extremely important.

What to measure? Number one: percentiles. Percentiles basically tell you how the consumers of that particular application or API are experiencing that service. This could be a human experiencing that particular service, or it could be your system calling an external API, or one API calling another API. You don't want to go for one hundred percent—that's up to you—but one hundred gives you everything, like how every single response is being handled. You want to go for ninety-five, ninety-nine percent because that's what tells you that, let's say, ninety-five percent of the service requests to that system—that's how they are perceiving the response.

You want to look at transactions per minute and transactions per second. You want to look at bandwidth. You want to look very importantly at latency, especially if you are launching a distributed system. Let's say you have a global site and you're expecting traffic from Japan, Europe, Latin America, North America, and so on. You want to make sure that latency is measured not only from one single region but globally. This is why most people who do performance testing today do it from a single region, which is not a real test. Why? Because you're measuring from your garage into your kitchen. No, you want to see what happens from another neighborhood, from another city, another country, and so on.

Distributed Load Testing on AWS: A Live Demonstration

Errors need no explanation. If you have an error, you go see what it is and why. And last but not least, resources—especially if you're going to do root cause analysis, you want to see what's happening with your CPU, your memory, and so on. There are three very common popular performance testing frameworks, and I'm going to go really fast now. JMeter has been around forever. It's really good and really complete. If you are into Java and XML, go use it. If you are into JavaScript, you can use K6 from Grafana. It allows you to basically develop scripts or testing scripts using JavaScript. And the last one, if you are a Python developer, you can use Locust. This is where AWS comes into play. If you want to use each one of these, most likely you will have to deploy your own infrastructure for that particular framework. However, at AWS we have a solution called Distributed Load Testing on AWS. It's an open source solution fully supported by AWS. A quick difference compared to AWS services is that it's single tenant—you have to deploy it in your environment.

Use it, and if you no longer want it, you simply remove it. We provide you with a CloudFormation template that gets up and running in production within 5 minutes. Let me show you how it works. This was recently announced just a few days ago—this is version 4.0.1 to be precise. When you deploy this solution, this is what you get out of the box. You don't have to build, you don't have to code—everything is ready to be used.

I'm going to do a quick test, which I pre-created to save time. Let's say I'm a major sports vendor provider and I want to run a test because something is going to happen tomorrow. I want to make sure my infrastructure can handle the load. The name of the test is "Global Sport Global Test." I can run the test now, tomorrow, a week from now, or I can run it on a weekly or daily basis. If you don't want to use the web UI, you can actually use your CI/CD pipeline and call all these tests through the APIs.

I'm going to run it now and use live data. This is what I was telling you about—you can use a single HTTP endpoint, or you can use JMeter, K6, or Locust within this single platform. You don't have to deploy three different environments, just one for everything. I'm going to keep it simple and use a single HTTP endpoint. You can work with K6 scripts, JMeter scripts, and so on, but I'm going to keep it simple here.

Now, this is where it gets really cool. If you want to do a global test, this environment is set up for 4 regions. You can do it for as many regions as you want, but if you want to simulate real traffic coming from all over the world, this is how you do it. The solution in real time is going to go global to all those regions, spin up all the infrastructure you need for that simulation, hit your endpoint, and when the test completes, clean everything up. You don't have to worry about managing any infrastructure.

I'm doing 4 regions, and for each particular region I can specify a combination of how many unique IP addresses, how many concurrent users, and so on. I already have all this data set up. I'm going to say I want this test to go from 0 to a million in 1 minute, and when I hit 1 minute, I want to stay there for another minute. I'm using the WiFi from the venue, so it's not the fastest one.

What you see now is that region by region it's telling me how many unique IP addresses, containers, or tasks I want to use, how many virtual users I want to simulate on each one, and what's happening. I'm provisioning all those tasks or containers, and eventually when they start running, we will start seeing all the traffic going into the system. I need to finish this in 1 minute 36 seconds. This is going to take a few seconds, but I do have another test that's already running. There you go. Here is another test that I started right before this session. You see that this is the average response time, these are all the successful requests, and those are the errors. The errors are because this is using auto scaling, and I made this environment in a particular way that shows me those errors to tell me that the instances are too small and I'm triggering the auto scaling event too late. That's why those errors appear, and that's how my ramp up is going up. Eventually when I hit my goal, it's just going to stay flat for, let's say, one hour. If I go back to my other test, it's still going. It's going to take a few more seconds, but because I don't have any more time, I'm going to go back here.

If there is something you want to do, maybe not tonight because you're going to dinner or to a party, but maybe on Monday when you get back home, deploy the solution and test it. Within 5 minutes it's up and running in production. If you have any questions, I'm going to be at the AWS village from 3 to 6 p.m. doing a much deeper demo into the solution. Thank you very much.

; This article is entirely auto-generated using Amazon Bedrock.