🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Managing Amazon EC2 capacity and availability (CMP331)
In this video, Brian Phillippi and Carlos Manzanedo Rueda explore AWS EC2 capacity management and availability trade-offs across different usage models. They explain On-Demand Instances with flexibility strategies using attribute-based instance selection, On-Demand Capacity Reservations (ODCRs) for deterministic workloads with sharing capabilities and interruptible reservations, Capacity Blocks for ML targeting GPU instances for training workloads, and Spot Instances with Spot Placement Score for cost optimization. The session introduces EC2 Capacity Manager, a free tool launched in October that provides unified visibility across all usage models, regions, and accounts, helping organizations manage capacity efficiently at scale with detailed utilization metrics and cost insights.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
Introduction: The Challenge of Balancing EC2 Capacity, Availability, and Cost
Today we're going to cover the topic of managing Amazon EC2 capacity and availability, and the trade-offs that come with it. With me, I have Brian Phillippi. Brian, would you like to introduce yourself? Sure. Hi everyone. My name is Brian Phillippi. I'm a Principal Product Manager at Amazon EC2. I've been with AWS for about eight years now. As a fun fact, today is Cyber Monday, so don't forget to go to Amazon and check out the deals. When I first started at AWS, I used to plan capacity for Cyber Monday, so I live, eat, sleep, and breathe capacity. I'm really excited to be here with you. In my case, I'm Carlos Manzanedo Rueda. I'm the Tech Lead for Compute. I'm a Worldwide Solutions Architect, and I help organizations find those trade-offs, especially on optimization and everything that comes with the trade-off for that optimization. Obviously, capacity and availability, together with price, performance, and cost, are among the top things we need to consider.
Before we start with the agenda, I would like to cover a couple of things. First, please raise your hands if you are part of financial operations or FinOps, or have been doing FinOps. I was expecting that. How many of you are coming from a platform engineering team or basically manage capacity as part of either an organization or a platform team? Perfect. I see a good portion of the audience with you on that one. Finally, who of you are application owners or perhaps developers and are basically looking at things like availability? Awesome. So you see that this covers the audience pretty well. I probably left a few personas, but I'm sure that you have to balance those trade-offs between these things, and this is what this talk is going to be about.
We're going to cover first what the challenge is. With the framing that we have set up, you more or less know what it's going to be about. We're going to cover what the usage models are. The usage model will help us define for different types of workloads what is best for your application. Starting with On-Demand Instances, On-Demand Capacity Reservations, Capacity Blocks, and Spot Instances. Finally, we're going to introduce a few things that have been released in the last two months or even in the last week that are perhaps a bit new.
This is a 300-level session, so there's an expectation of some level of hands-on proficiency on the topics that we're going to cover. For those of you that perhaps find new things that we're covering, this is going to give you the pointers to do the homework and learn how you can apply these concepts as part of your day-to-day activities on capacity planning and availability. We will end with some key takeaways. Let's start with the challenge first. We were talking about these trade-offs and how we apply those trade-offs.
I'm sure that we have heard that cloud capacity is infinite, and we know that's a reality. That reality is the physical limits of how we manage that capacity internally, which is what we will cover here, and there's a trade-off that comes with that. There's a trade-off with how much capacity is available and therefore what the pricing model is going to be and what the low prices we can get with those are, and how we can maximize that efficiency. Just to give you an example, you probably have heard about Spot Instances within that balance. Spot Instances are spare capacity that we have in our data centers, availability zones, and regions. We put that spare capacity at your use for a steep discount of up to ninety percent. However, the trade-off is that those instances can be interrupted. So there's a trade-off with availability as you see. We're going to be exploring all those different options that you have with the different usage models so you can understand that level of capacity versus availability versus utilization goals.
On-Demand Instances: Achieving Elasticity Through Flexibility
Let's deep dive first into On-Demand Instances. Probably this is the usage model that you are more familiar with. On-Demand Instances are basically for your elasticity, just to get that capacity when you need it and at the point that you need it. So it simplifies things a bit with the pay-as-you-go model. When do you need that capacity and when are you going to get it? We've got more than one thousand instances at this point in time. My recommendation is to go and pick the latest generation of those instances. Typically they will come with the best price-performance, so that way you basically get a selection of instances that is easier to manage. And we offer those instances in more than 120 availability zones and 38 regions.
We also offer them on local zones, so you can manage that capacity and availability in places where latency is key for your customers. On-demand instances are typically used for stateful workloads that have scaling patterns, such as cyclical patterns, or perhaps you have some time of capacity that you need, like spurious or temporary short-term workloads. For those times when you are not ready to commit for a longer term, like with savings plans, and you want to test and wait until you are ready for that commitment, it is important to understand how those capacity pools for on-demand work.
At some point when you are selecting a particular instance type, that pool can go down and you may suffer from an insufficient capacity error. We do lots of things behind the scenes to make sure that does not happen and we present to you that elasticity we were talking about. We are definitely doing forecasting to get the capacity up front. We are also doing placement in the regions where that capacity is needed and removing any bottleneck to rack and stack as fast as we can. Behind the scenes, we are doing things like pool balancing, for example, where we have an excess of 16 X large instances for a particular type of instance. If at some point there is a need for 4 X large, we will split those instances and create 4 out of those 4 X large instances. We are creating that supply and demand, but still, at some point if you select only one instance type, a particular pool may dry out.
Just to give you an intuition of that capacity, we just released an M8i, the new Intel instance, about two months ago. About two months ago we also released the M8a, the AMD version. I would recommend you take a look at those. As you may expect, even if we are racking and stacking as fast as we can, these are new instances, so there is going to be less capacity of those. Even if those are your preferred instances, you need to take into consideration that capacity availability factor.
One way of doing that is being flexible. We are at a buffet, so let us use the analogy of a fair. You may want to go for the ravioli. Perhaps at that point in the buffet there is not going to be ravioli, so that is too bad. I am sure that if you are flexible and you like other pastas, you are going to be still getting the right meal. Not only that, if you are flexible with the rest of the food, you will get an even better meal out of it. So there is that concept of flexibility that improves how you tap on different pools of capacity.
You can expand that to protect yourself effectively when you are using on-demand instances and still go to your preferred instance but know that you can always protect yourself with instances if you are a bit flexible. The way that you apply that flexibility is either using different instance families or perhaps a different CPU manufacturer in x86, Intel and AMD. Different generations or different sizes. I understand that a few of you, especially in large organizations, may do some level of qualification of instances just to verify that those instances match the service level objectives that you need. When you have deterministic workloads with deterministic performance needs, this makes a lot of sense. I would like you to challenge those conceptions.
I think that in most cases, even when you qualify, what you are after is not a deterministic level of performance. You are after a baseline, and after that baseline you can use other instances. For example, if your baseline gets met with a generation 7 instance, probably generation 8 is going to be still equally valid. So you can think about those levels of flexibility that we just mentioned. Applying that flexibility is easier than you think. We have embedded all the ways to do the heavy lifting, both on EC2 Fleet and on Auto Scaling Groups. Those integrate with the rest of our service. For example, behind Capacity Manager, you are going to find EC2 Fleet making that selection and allowing you through the node pools to create that exact elasticity and flexibility through multiple instances. The same thing goes for Auto Scaling Groups that integrate with AWS Batch, with ECS, and so on.
The way that you use it is super simple. You will define or create a set of fleet auto scaling group, apply that flexibility that you are after, and effectively let that fleet auto-scaling group manage the instance lifecycle for you. We introduced this concept about 2.5 to 3 years ago: attribute-based instance selection. This is a way that allows you to define your flexibility and instance selection through properties or attributes. For example, you can say you would like any type of instance that offers a certain number of vCPUs, a specific amount of memory on a latest generation or generation 6 onwards, and a particular storage type of EBS. That will define the types of instances that create that flexibility group for you. You can still define instances manually as well, which is perfectly fine.
When you define those instances, for example, just to give you a pattern that you can use for on-demand instances to protect yourself, you can start defining with a prioritized allocation strategy. You will list a group of instances in order, starting with your preferred instance. But if you cannot get that instance, you are okay to get your second preference, your third preference, and your fourth preference. The main idea is that you are protecting yourself and making sure that if that scaling group cannot find your preferred instance, it will jump onto the next one, so you will not suffer those issues.
This is a pattern that I would recommend. Allocation strategies can also be used in the case of Spot instances. For example, capacity optimize allocation strategies will pick instances from the deepest pools, so they will avoid frequent termination or an increased frequency of termination. If you are price sensitive, we have allocation strategies that are driven by price optimization. You will list all the instances and we will pick the ones that are optimizing for price for you.
As an example of this attribute-based instance selection and how you apply it, you might say you want 8 vCPUs, 8 gigabytes of RAM of type Graviton, and EBS-only storage. That automatically will create that flexibility group that includes m8g.2xlarge and larger, up to c7gn.2xlarge, as you see on the left corner. These are network specialized instances and we will pick up from those. So it is easier to effectively plug that model.
On-Demand Capacity Reservations: Guaranteeing Capacity for Critical Workloads
If you want that deterministic level of performance or specific instance types, we are going to move to on-demand capacity reservations. So like Carlos mentioned, if you really need to optimize for a specific instance type and you need to have the capacity there, you can use a capacity reservation, which is kind of on the other end of the spectrum in terms of cost and availability. With capacity reservations, you essentially hold or pay for the capacity the entire time you hold the reservation.
Let's talk a little bit about what capacity reservations are. The acronym for capacity reservations is on-demand capacity reservations, or you might hear me call them ODCRs. Essentially with these, you are actually reserving a specific instance type and availability zone combination. Looking at the capacity availability hierarchy, we have on-demand, and I would say on-demand is world class in terms of the availability that we can give to you with all the different number of pools that we have. However, as Carlos mentioned, you can actually go up in terms of your availability by accessing multiple pools by being flexible. Finally, if you really need to have that capacity assurance, that is when you would use the capacity reservation.
The reason we give you assurance is we are actually setting aside that capacity for you. But because we are setting aside the capacity for you, we are not letting anyone else use it or run instances on it. So we charge the entire time you hold the reservation. If you are running an instance within the reservation, there is no additional cost at all. It just looks like a running instance in your cost and usage report. But if you are not running an instance, that is when you will be charged for an unused reservation, which charges basically the same thing as a running instance.
The good thing about capacity reservations, especially if any of you have been with AWS for a long time, is that unlike zonal reserved instances with capacity reservation, there is no commitment period. You can cancel it anytime you want to. There is no one or three year commitment period.
So when should we use an ODCR? First is if you don't have flexibility. When you're at the buffet, like Carlos mentioned, and you really need ravioli at 6 p.m. on a Friday night in North Las Vegas, that's when you would want to call and make a reservation to make sure you're able to get your food. That's the same thing with the capacity reservation. If you really need a specific instance type at a specific time of day in a specific availability zone, you probably need to use a capacity reservation.
The other time that we recommend using capacity reservations is if your workload is so critical that you absolutely need to be able to spin up instances or bad things happen. Then we would also recommend using a capacity reservation. There are two different ways you can create the capacity reservation. You can create it on demand, which means creating it based on what capacity we have in the pool. When you create it on demand, we'll take capacity and set it aside for you, or you can actually create it at a future date.
For on demand, we often see customers creating the on-demand portion of the reservations if they have a workload that takes a while to scale. If I have a workload where it takes me time to provision a single instance at a time or a couple instances at a time to get up to my scale, it can be really frustrating to get 90 percent through my deployment and then run out of capacity. With the capacity reservation, you can actually get a synchronous call to say is the capacity there. It basically says yes or no, is the capacity there? And if it is there, it creates the reservation for you so that you can just spin up those instances at your leisure and know that it's going to be successful.
The other time that we see customers using the on-demand version of capacity reservations is when they have already running instances and they want to cover those running instances with a reservation. Like I mentioned, there's no extra cost for covering a running instance with a reservation, but if you need to do maintenance or bounce your fleet, you'll be able to do so and make sure you can get that capacity back when you do the reservation.
The other type we have is future dated capacity reservation. There are a few different use cases for future capacity reservation. One of them would be like a peak event, such as Q4 peak like we're having right now with Cyber Monday and Black Friday, making sure that we have capacity for a big event. Another would be if you're performing some big migration and you want to switch to a new instance type. A future dated capacity reservation can help you do that. And then finally, if you're increasing your baseline by stepping it up, a future capacity reservation also can help.
The way that it works is it's just like you're placing an order, like we talked about—you're calling the restaurant. So you request a capacity reservation for the date that you need it, so it could be anywhere between 5 and 120 days in advance of what you need. Then you specify how long you are planning on holding the capacity. Typically, because we might be racking and stacking new servers for you or setting the servers aside, we ask for about 14 days of usage after the reservation. But you'll get an answer or you'll get a reservation created right then and there and it'll be in an assessing state. Then within about 5 days, we'll come back and put that in a scheduled state for you, and then you just need to wait for the capacity to become available and you can launch your instances.
So we've talked a lot about what ODCRs are, how to use them, and what they're for. Now let's talk a little bit about how we use them. With an ODCR, it works on this principle of matching. Your instance needs to match your reservation and then the two will marry up and you'll be able to launch your instances with it. There are four primary things that you'll need to match with your instance launch and your reservation.
First is instance type, whether it's C6I.xlarge or M7I.xlarge, whatever the instance type that you need, the instance that you're launching needs to match the reservation. We also have availability zone, platform or operating system, and then finally tenancy. Tenancy is whether it's single or shared tenancy.
Most of our customers use the shared tenancy. Optionally, if you are using a capacity reservation with a cluster placement group aligned with it, you will need to match that placement group ID with your ODCR. Cluster placement groups help with things like high performance compute. They allow us to put all of the instances in really close co-location with each other so that you have really fast latency between the instance types.
Now, you have to think about instance eligibility. For probably 80 to 90 percent of use cases, your instance eligibility is just going to be the default or open instance eligibility. What that does is act like a public pool of capacity that anybody can tap into. The default when you launch an instance and the default when you create a capacity reservation is always going to be open, and so it always just works like magic.
However, if you have a multi-tenant account where you are trying to protect capacity from somebody else in your account from using it, you would want to use targeted. Targeted basically means that when you launch an instance, you have to specify either the reservation ID or a resource group that the reservation is in. That way you can protect and create a private pool of capacity versus that public pool that openness provides. Typically, it is multi-tenant accounts that are using the targeted option.
Going back to what Carlos said, if you architect for flexibility, you still may want to use capacity reservations from time to time, especially if you have spare capacity reservations that are not being used but you want to save them for another time. So with an auto scaling group, how do we do that? How do we make sure that when the auto scaling group picks the reservation, say it wants to pick m8g.2xlarge but you have an m7g.2xlarge, how do you make sure that the auto scaling group is matching up with your capacity reservation?
Last year we launched something called ASG Capacity Reservation Preference. With this you can actually set that preference so that you get better utilization of your capacity reservations. You can either use capacity reservations only, which basically means if you have a capacity reservation that matches something within your auto scaling group, it will launch the instance using the capacity reservation. But if you do not have a matching capacity reservation, it will not launch an instance and you will get an error.
You can also use capacity reservations first, which is similar, but instead of when you do not have a capacity reservation, it will actually try to use on-demand. It will first try to look for a capacity reservation to use. If it does not have one, then it will fall back to on-demand and go back through that list that Carlos was talking about where you try to find the highest priority instance. What this does is help you improve efficiency.
If you want to talk about cost and availability, the way to save costs when using a capacity reservation would be to use it fully and get the most bang for your buck out of that reservation. Let's talk a little bit more about how we can do that. One of the best ways that you can get better utilization out of your capacity reservations is through sharing.
Imagine you have a high priority workload that only runs during the day and maybe has a small baseline at night, but in the middle of the night there is not a lot of usage. In order to fill in those gaps, you can actually share the capacity reservation with other lower priority workloads that can be interrupted. That is a really great way that we see for our customers to increase the utilization, especially if they have spiky workloads that need that capacity assurance.
When you share a reservation you can actually share across accounts. You can share across organizations, so you can share one to many if you would like, or you could even share outside of your organization with another account if you want. The way that you do that is you use Resource Access Manager, or RAM, to set up the share.
Now, by default, if you own the reservation and you share with someone else, if that person does not use the capacity, then you as the reservation owner will be charged for it. If they do use the capacity, they are just charged for whatever usage they consume.
However, if I'm a central team and I don't get charged anything for unused reservation, but I want the ability to control and modify the reservation while having Carlos pay for it, one way we launched last year was the ability to assign billing ownership. Now I can assign billing ownership to Carlos because he's the one who requested the capacity. I, as a central team, can manage it, but I can give it to Carlos and make him pay for any unused capacity since he was the one who asked for it.
As a side note, we just launched this year the ability to share cluster placement groups and On-Demand Capacity Reservations as well. In the past, reclaiming that capacity took a little bit of manual work. If I'm that high priority workload in the middle of the night and I realize I need that capacity back, in the past what I'd have to do is get on the phone or page the on-call person for the low priority workload. The on-call person gets up—I've been on call, I'm sure many of you have been on call—and you have to get up in the middle of the night and log in. It can take a lot of time and manual effort to go in and take the action to terminate those running instances so that the high priority workload can get the capacity back.
But now when they need the capacity back, they can interrupt it with a two-minute notification. We do all the automation, we interrupt the instances, we give it back to the right reservation, and then you're able to launch your instances. I like to set it up and outline it like this: the On-Demand Capacity Reservation is almost like the parent reservation, and the interruptible On-Demand Capacity Reservation is like the child reservation. There's a connection between the two. I also like the parent-child relationship because I have four kids and my kids are always interrupting me. You can then share that interruptible capacity reservation using RAM share to your organization, and then take it back when you need it.
Now, just one quick note on this: there are producers and consumers with capacity reservations. The producer is obviously the high priority workload, but the consumer, or the low priority workload, needs to make sure they know what they're getting themselves into because they can have their instances interrupted. In order to launch into it, the low priority workload first needs to target that capacity reservation like we talked about, but then they also need to set the market type to interruptible capacity reservation in order to tell us that they're okay being interrupted.
You can set this in a launch template and set it and forget it, and it all works. Then you launch your instances and wait or listen for that interruption notice. Once it comes, you just save your progress and shut down gracefully. Finally, we've talked a little bit about these two models: On-Demand and On-Demand Capacity Reservations. There is a way you can actually tune the cost of this capacity and lower your costs, and that's through something called a savings plan.
With a savings plan, what you're really doing is committing to use some sort of capacity for a certain amount of time. It's basically an hourly commitment for at least one or three years, and with that hourly commitment, you get a discount on the usage that fits into that commitment. With our top three-year instant savings plans, you can get up to 72 percent off. Now there are two different flavors of savings plans: compute savings plans and instant savings plans. I would say the compute savings plans are the ultimate in flexibility.
If you need to run a workload and have it be eight hours in the Americas and then eight hours in Europe and then eight hours in Asia, you can use a compute savings plan, which allows you to get a discount on any usage, any instance type in any region. That gets you up to 66 percent off of the On-Demand price. Or if you're less flexible and you want to have a certain instance type, you want to get a bigger discount but stay within a region or within an instance family, you could use Instant Savings Plans.
Transitioning to Additional Usage Models
With that, I'm going to turn it back over to Carlos to talk a little bit more about a couple of different usage models.
Capacity Blocks for ML: Reserved Capacity for GPU-Intensive Training Workloads
Let's keep going with the usage models and different trade-offs. We cover basically deterministic performance, on-demand capacity reservation, and on-demand. There's one particular one which is Capacity Blocks for ML that target GPU instances or accelerated computing instances, including training. These are instances that probably first thing are in high demand, and being in high demand, as you would expect, there will be some higher level of scarcity in some cases.
Second, they're harder to be flexible on, especially for training. There's lots of improvements happening on how you manage that kind of flexibility. If you check training libraries or frameworks, you'll find things like Qutile or JAX or plenty of things that at compile time will try to make the most of some domain-specific language and try to make sure that it compiles toward a specific target platform. But the reality is that there's a huge level of performance that you can get out of GPUs, and that level of performance comes from maximizing and getting the low-level things right.
In the case of GPUs like NVIDIA, quite a lot of this training is done using CUDA and maximizing and squeezing down the bottlenecks to get the right price performance, which makes the whole thing much more difficult to get that flexibility across instances. These instances are also expensive, which means that getting into a savings plan might not be the right way to go without doing any experimentation.
Capacity Blocks is the contract that helps get that kind of block for reserved capacity from one to 26 weeks, so that's half a year. We extended that quite recently. When we apply that capacity block, when we request a capacity block, if we get that capacity block, an on-demand capacity reservation will be put on top of it. So if at some point we are not using that capacity block with our own account, we can use those techniques that you just learned from Brian to hand over to other parts of your organization and make the most of those.
When do we use Capacity Blocks? As you would expect, training is the right place to think about Capacity Blocks if you know that there's going to be a big exercise with training coming. Make sure that you have that availability and that capacity. If you also have a campaign, a seasonal campaign coming, and product launches and stuff like that, same thing. Obviously, for some level of experimentation, you want to have that capacity over that experimentation without doing that savings plan commitment over time. So those are the places where you think about Capacity Blocks.
You can get up to 64 instances, from one to 64 instances. When you multiply that in GPU, that's 512 P6E-P200 GPUs, for example, which gives you, for the moment that you put that together with a placement group, cluster placement group to get all the bandwidth that you need, a really big cluster for training. Makes sense? And obviously most of these go on Linux systems.
The same concept of Capacity Blocks for ML applies also to SageMaker. SageMaker training plans is the equivalent. It has some differences in the way that you can even apply 30 minutes in advance, and if that capacity is there, it will come directly to you. But it's used for pretty much the same aspects. It will plug and integrate within SageMaker with HyperPod. You can ask for Ultra cluster service as well, so you can get also those P6E-P200s that we were talking about.
That experience will be both exposed through the dashboard and through an API, so you can get access to any of those. As you would expect, you can use it for predictable access to accelerated compute, including training one and training two, and primarily to make sure that you have all that capacity in the place where you have the data. As I was saying, most of these workloads are less flexible than general workloads because of the reasons of compiling to the target architecture, but also the amount of data you need to process those training models.
Spot Instances: Leveraging Spare Capacity with Fault Tolerance and Flexibility
Let me move to the other side of the spectrum. You are probably already acquainted with what Spot instances are. I mentioned this in the first part of the session. These are spare capacity that we have in our data centers. They are sold with a steep discount of up to 90%, though typically you will find 60 to 70% discounts. You can come in this path. People forget that if you can run opportunistic workloads on GPUs, you can also access GPUs on Spot instances. Now the question is where to find them. If you haven't heard about the Spot Placement Score, we will talk about that in a moment. Let me cover the basic Spot instances, which are mostly used with C, M, and R type instances—just regular instance types.
The key for using those instances is two things: managing that two-minute termination notice, so you need some level of fault tolerance. Second, the key thing will be being flexible. Being flexible across many different dimensions. You can use dimensions of instance sizes, generations, types, or you can be flexible also on time-shiftable workloads. Typically there is more spare capacity overnight, so you can basically move some of the batch workloads to overnight to find the capacity and the scale that you need at those lower prices. If you are flexible towards region, you can also move to different regional availability or find that regional availability using a Spot Placement Score.
Let me cover the aspect of fault tolerance first. We were talking about those interruptions. We still have that two-minute interruption notice, which we are extending to up to thirty minutes for GPU. We also are providing something called a balancing recommendation, which is effectively a notification that we give. In the worst cases, it is only two minutes in advance, like the Spot termination. But typically we tend to give it ten to fifteen minutes in advance, telling you that that instance is going to be put at risk of termination. In fact, we embed the replacement of that instance with some other instances that look similar in auto scaling groups or in Capacity Reservations as well.
You can even use Fault Injection Simulator to create deliberate termination from your site as part of your testing model. So you can verify that your application actually is fault tolerant towards these terminations. Verify how it works, and even build it as part of your continuous integration. The key thing is flexibility. And aside from flexibility, finding out which dimension of flexibility might be key for you is important. I do not know how many of you have heard about the Spot Placement Score. We released it about two to two and a half years ago. Spot Placement Score is an API and it is in the dashboard as well.
You can select a diversified configuration, similar with the Attribute Based Instance Selection that we were just showing before. You can also set up a target capacity. For example, I would like out of that attribute-based instance selection example that I use, ten thousand instances or fifty thousand instances. It will give you a score from zero to ten. A nine means you are going to find that capacity and you are going to get a low level of terminations. A three means the opposite—you are probably not going to find that capacity and the frequency of termination is going to be quite bad. It is going to give you effectively a number for the different regions that you are trying to get access to and all the different availability zones.
So if you have something like a Spark workload that is very chatty and you are trying to avoid inter-availability zone charges, it can tell you which is the right availability zone for your analytic workloads. You can always do that manual selection that we were talking about. We have been covering all the different usage models and now even how to put them at scale. The reality is that most of our customers are going to have a mix of those models. We just released something that I am super proud of—Brian's team and everything that they do on how we can bring and put all of those things together.
EC2 Capacity Manager: A Unified Solution for Managing Capacity at Scale
Brian, would you like to? Sure, thank you. So like Carlos mentioned, we see all these different usage models being used throughout organizations, and there are different reasons that we have gone over of why you would want to use those different usage models. At a small scale, it is pretty easy to manage the capacity if you are just looking at a single region or a single account. It is not too bad. You can use the console to see exactly what is going on. However, if you are in a large organization trying to manage capacity across the entire organization, you are having to deal with thousands of instance types, potentially thousands of accounts.
Add to this hundreds of availability zones, dozens of regions, and a mixture of all of these different usage models, AWS has a lot of different tools that could help with this, like Cost and Usage reports, describe APIs, or even CloudWatch. However, what we found when talking to our customers is that they had to keep going back and forth between all of these different tools to figure out what's going on. It could take them even hours to get a good picture of what their capacity looks like in order to manage it.
Back in October, we actually launched and released something called EC2 Capacity Manager, which helps bring all of those tools together into a single uniform view. The idea is now you can see all of your On-Demand usage, your ODCR usage, capacity block usage, and even Spot data across all of your regions and across all of your AWS accounts in one place, and it's absolutely free. We're really excited about this because we think it's going to help you get a better picture, be more efficient, and manage your capacity more effectively.
Let's jump into a little bit of a demo here. This is the capacity manager dashboard, and the idea with capacity manager is to always start at a really high level, kind of the ten thousand or fifty thousand foot view, and then give you the tools to drill into your data. Up top we have the dashboard, and then all of those other tabs are tabs for you to dive into your usage, dive into your reservations or your Spot. You can also set your date filters so you can set the date filters up to a few hours ago, all the way to ninety days ago, and you can have granularity of a daily granularity if you want to see trends, or if you really want to get into the weeds, you can see hourly granularity.
Within the dashboard we have an overview section, and you'll be able to see really quickly how well you're doing with your capacity reservations. When I say capacity reservations, I mean both ODCRs and capacity blocks, and you can see how efficiently you're actually using those reservations. When I say efficiently using your utilization, I mean is an instance on or off. I'm not talking about CPU utilization. You'll also be able to see the estimated cost of all of your underutilized capacity reservations so that you can see where you're spending money that you're not getting the best value for.
You'll also be able to see usage and what percentage of your usage is actually covered by a capacity reservation, and then even Spot. Normally I'm going to give an aside here. This is one of our test accounts, so normally you're going to see a lot better than forty-four percent utilization, you're going to see a lot better than one hour of interruption time. That's just us testing things out. But you can also see where you're getting interrupted and where you're getting good usage of Spot, so that if you're not getting the runtime that you want, you can go add more flexibility.
We'll also give you usage metrics, and you can break out those usage metrics by vCPU and vCPU hours, by instances and instance hours, and even estimated cost. You'll be able to see what percentage of your On-Demand usage is covered by a capacity reservation, what percentage is not covered by a capacity reservation, and then what's also running on Spot. Then you can see the usage trends across time to see how things are changing.
You'll also be able to deep dive into reservation metrics and see what's used and not used, and then we'll have this unused capacity section, which basically gives you your top instance type and availability zone cross sections where you have the most unused capacity reservations. And then finally, you'll be able to track Spot. Let's look at what it looks like to deep dive into one of these things. Say I want to look at some reservations that I'm not using as efficiently as possible as I'm looking across all of my organization. I can click into reservations and see some of the same things. We see the date filter, we see new things like dimension filter and aggregations. The aggregations is where all of your data is going to lie. And right now, like I said, we start at the top and then let you dive deeper.
You will also see different statistics about your capacity reservations and trends. If I have 715 reservations, that is a lot to think about. If I want to break out those reservations, I could choose something like instance family or availability zone. When I do that, it breaks everything out. Now I have my instance family and availability zone there so I can see exactly that I have 4 reservations of M5 in US East 1 AZ1. Now it is much easier for me to dive into each one of those reservations.
You can also filter the reservations, so if I only want to look at M5, I could click that little plus button, or I can dive even deeper into the reservations and apply those filters. I can see what statistics are about just that line item and see different trends about what is being used or not used. Then I can even dive into specific reservation IDs. If I wanted to, I could even click into the individual reservation IDs and actually make changes to those reservations.
There are a few other features that I want to point out because we are running short on time. One is usage. The usage tab works very similar to the reservations tab, where you will be able to see your total usage trends, what is covered by a reservation, what is not covered by a reservation, and even your spot usage. You can break it out by instance family, availability zone, account ID, or whatever you would like. Then you would also have the spot tab, and this is where you can dive into some of your runtime metrics so you can find out where you need to spend more time making your spot fleets more flexible so that you can get better runtime.
If you wanted to, this is available through the dashboard, through APIs, or if you want to set up a data export, which is basically just an hourly dump of all of your data, you can set it up to go into your S3 bucket. The only thing that you would ever have to pay for with Capacity Manager is the storage cost for S3. You can have data export set up to every hour just give you all of your data and then you can use something like Athena or AWS Glue to do transformations on that if you would like.
Wrapping it all together, Capacity Manager is a fantastic way for you to take all of the different things that we have talked about today and wrap them up into a single solution so that you can make sure that your organization is using everything as efficiently and effectively as possible. You can either set it up at an account level, which means all your data in a single account, or if you have access to the payer account or a delegated admin of the payer account, you can set it up at the organization level. The call to action today is that it is completely free to set up Capacity Manager within your accounts within your organization, because it is going to give you a much better view of what is going on without having to go to all the different tools to figure it out.
Key Takeaways and Closing Remarks
So to wrap everything up here, some of the key takeaways that we want to make sure that we highlight is one: EC2 has a bunch of different usage models. There is no silver bullet or singular usage model that is going to work best for everybody. Oftentimes we see most organizations use all of the different usage models because they have different workloads that they need to optimize. It is going to be a balance between all of them, and finding that balance is incredibly important.
Second thing is if you want to continue to tune things like cost and availability, you can add savings plans. If you are willing to commit for a one year or three year term, you can get large discounts on your on-demand usage. Or if you add flexibility to your workload, you can improve your capacity availability. Third, if you want to manage your EC2 instances at scale or your EC2 capacity at scale, use Capacity Manager. It is really easy.
I just want to say thank you so much. Sorry for all the technical difficulties, but I really appreciate your time. If you want to connect with Carlos Manzanedo Rueda or myself, here are our LinkedIn profiles. Thank you very much. Thanks everyone.
; This article is entirely auto-generated using Amazon Bedrock.







































































Top comments (0)