🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Optimizing price performance with enhanced Amazon EBS gp3 volumes (STG320)
In this video, Dutch Meyer and Sapna Gupta from AWS explain how to optimize gp3 EBS volumes for performance and cost. They cover the decoupled architecture that allows independent scaling of IOPS, throughput, and capacity, unlike gp2's coupled model. Dutch explains storage fundamentals including the relationship between latency, IOPS, and queue depth, demonstrating how to monitor performance using latency histograms and new CloudWatch metrics. The session details how AWS recently increased gp3 limits to 64 TiB and 80,000 IOPS through shard optimization rather than simply adding more shards. Sapna discusses using Elastic Volumes for seamless modifications, shares VMware Carbon Black's success story saving $25,000 monthly by migrating from gp2 to gp3, and introduces the new clones feature for instant volume copies in dev/test environments. The presentation emphasizes right-sizing volumes to match workload requirements rather than maximizing performance.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
Introduction: Getting the Most Out of GP3 Volumes
Everybody good? You all know you could be at lunch right now, but you're here with me, and I appreciate it. Obviously, this is a sold out session, and since the camera's on me, no one will know if I'm telling the truth. My name is Dutch Meyer. I'm a principal software engineer with Amazon Web Services, Elastic Block Store, EBS. I'm joined here by our principal product manager, Sapna Gupta, and we're going to talk to you today about how to get the most out of your gp3 volumes.
If you don't already know, the gp3 volume type is, as an operator of the service every day, the volume type I see the most of. That's because it's designed to be really the default choice or the best choice for the widest range of workloads. It's called general purpose for a reason. I'm going to talk a little bit about how we think about running that service, how we think about building it, and what you can do to get the most out of it.
Here's our plan. I'm going to start the session and talk mostly about reads and writes down the I/O path. I'm going to stick to storage fundamentals. I'm going to talk about this problem of coupled versus decoupled storage, and I'll go into that in much more depth. What I'm talking about there is the way that your capacity of storage and your performance of storage is coupled together or not. I'm going to talk about how to monitor that performance, make sure you're getting what you're supposed to be getting, and what to do when you start to get out into the extremes of performance.
Then Sapna's going to take over. Do you want to introduce yourself?
Good afternoon, everyone. This is Sapna, Principal Product Manager at AWS. I'm super excited to be here today. Thank you for joining us. I know it's lunchtime, but I hope you all enjoy this session. After that, you can go for lunch. I'll cover the last two topics of this session today. How do you evolve your volume to optimize for price and performance? I'll cover that not just for your primary environment, but I will also talk about how you do the same thing for your secondary environment. So with that, I'll hand it over to Dutch, and then I'll be back.
Thank you so much.
Storage Optimization Philosophy and GP3's Flexible Performance Range
Okay, so I care a lot about storage. I'm really passionate about being a storage optimizer and a builder, but as a confession, in my private life, I am extremely wasteful with storage.
I have by my computer at my house a rack, not in the server sense, but just like a bookshelf full of hard drives. It's basically all the hard drives I've ever had. A couple of them are disassembled because I was curious, but I'll be an egregious waster of space there in my personal life. If you're like me, and I know a lot of you are, you've got a whole bunch of volumes lying around. You're not really using them, whether they're physical disks or EBS volumes, or you're taking a system and you're benchmarking it full speed against something else.
The point I want to make here, and it's really important for gp3, is that an optimized system is not a system running as hard as possible. It's a system that's in balance, where your IOPS and your throughput and your capacity, your latency certainly, and in some cases even your durability is correct and aligned to the workload you have. That's what I mean by optimized today. Storage is about making those hard choices between those different trade-offs, and what I'm going to talk about is how we can build this system, gp3, which is meant to serve the vast majority of customer workloads, and all of them are valid points in that space.
So looking at the range of possible performance characteristics and sizes of gp3, it's quite large. We go from a one gig volume all the way up to 64 terabytes today. That's important not because it's so fast, it is fast, but there are faster volumes you can get. What's important about this for gp3 is that anywhere in that space is a valid gp3 volume with gp3 behavior, gp3 durability, and it's even more complex than that because if you step back, you can move in either direction. You can move in the capacity direction and make a large volume that's cold. That's a valid gp3 volume. You can make a smaller volume that's fast. It's a valid gp3 volume.
How EBS Transforms Coupled Hardware into Decoupled Storage Through Sharding
If those numbers look a little bigger than you're used to, it's because we bumped them in the last month or so. I was actually involved in this. We took the capacity up four times, the IOPS up five times. Again, all valid gp3 volumes. So I want to talk about how we do that. It's important to understand how EBS works as a service.
Then I'm going to talk about what it means to do that scaling, to make a bigger gp3, and how you can take advantage of it for your workloads. Your volume comes to us as provisioned with some kind of entitlement. It's got a set number of IOPS. It's got a size.
You let us know what that is. What we do internally is we take that volume and we shard it. Lots of systems shard. It's a core technique of storage. The thing I want to point out about this sharding is that it lets us take that large span of different GP3 characteristics and reduce it to a smaller set of shard characteristics. So your entitlement from a performance perspective is carried through to the shard that makes up your volume.
Because we can vary the number of shards, we can reduce the span and make it actually less dynamic. So smaller variance in our fleet of the performance that those shards bring. What you get with that GP3 volume is the ability to purchase a decoupled performance capacity volume. I am deeply envious of that ability because on the EBS side, I actually cannot do that. When I go and buy raw commodity storage, you can imagine me buying it off Amazon.com, I cannot get exactly the right volume that Sapna is going to need for her desktop tomorrow.
It is absolutely not what I do. If you look at our fleet, it is actually fairly homogeneous. We vary by vendor, we vary by size and performance a little bit, and we do roll our hardware revisions regularly, but our fleet is actually fairly homogeneous from a disk drive perspective. What we are doing somehow is we are taking a very coupled storage fleet and we are turning it into decoupled storage.
When you, the customer, buy that fleet, and let's say you are off in the extreme of performance with high IOPS, we have the freedom on our side to use our fleet size to put that in the fleet at the right place. The right place for that is probably nearby some other customer who has a capacity-heavy cold volume. It is kind of the core magic of EBS that we can turn our hardware, our physical fleet, a very coupled performance and capacity storage into a dynamic offering that gets mixed performance and then we recombine them to utilize that hardware as much as possible.
Scaling GP3 Performance and the Advantages of Decoupled Storage
Two immediate lessons on your side of the stack. One is by specifying the IOPS and the throughput that you need to be exactly what you need, you actually enable me to do a better job placing in my fleet to take advantage of my hardware and it lowers costs for all of us. We actually just pass that on to you for the most part. When we want to increase the performance as we did about a month ago, for GP3, the naive approach would be to add more shards, right? You just scale that out. You get twice as many shards, it goes twice as fast because your IOPS are spread twice as much.
That is actually not what we do. Because when you do that, it has ramifications to the lived experience of GP3. The performance impacts are not what we want for GP3. It means that anywhere there is a hardware event, something unfortunate happens in our fleet, you are more likely to feel that the more shards you have. So what we actually do when we are looking at increasing the range of GP3 is we are making those shards more performant all the way down the stack.
I am going to talk about performance in a minute. I want to circle back on this coupled versus uncoupled performance because we do have a GP2 volume type. People use it, it is fine. But GP2 is giving you that coupled performance. When you buy a GP2 volume, you are getting capacity and performance tied together. It is a fine replica of buying a physical disk. But it does not actually have the characteristics that I want you to have as a customer.
I want you to be able to independently scale those two axes. GP3 in comparison gives you that decoupled performance. If you want more IOPS, you can buy more IOPS, mostly independent of the underlying capacity. What you get there is a cost savings. What this enables you to do is to build a strategy for your volumes depending on your workload. This is the core trick that lets us create this volume type that is the most broadly applicable.
For more transactional processing, you want more IOPS, right? More IOPS returning with more latency quickly. Those tend to be smaller operations. For larger throughput, you are going to have larger block sizes. The IOPS and throughput are connected to the block size that way. In that entire range, you are getting GP3 latency and performance.
Understanding Latency Performance: GP3 vs IO2 Volume Types
Let's talk about performance as measured by latency, because performance is more than just IOPS and throughput. Controlling latency is really important. I monitor latency extensively. It's the thing in EBS that is kept constant. Your experience from a latency perspective is the same, no matter where you are in that range of gp3 volumes.
The gp3 experience is characterized as a single-digit millisecond experience with 99% compliance. In comparison, there are io2 volumes, which are great if that's what you need. That's a very different experience—sub-millisecond. The biggest difference with io2 is really the conformity to that experience. I won't go into the full gp2 performance curve since this isn't a gp2 talk, but I can say that the incidence of I/O operations that take more than 800 microseconds is 10 times lower on io2 than on gp3. If someone needs that really fast latency and strict compliance to latency, that's the io2 experience. It's just different from the gp experience regardless of how many IOPS or how much throughput you have.
The Relationship Between Queue Depth, IOPS, and Latency
We've talked about coupled versus uncoupled storage and touched on latency and what the product is. Now I want to give you one of my best tools. I use this every single day. This is not about EBS specifically—it's just storage fundamentals. There's a relationship between your latency and your IOPS that's connected by queue depth. This is basic queuing theory. Your latency is your queue depth divided by your IOPS. It's physics, and there's a linear relationship between the two.
Let's say, for example, you have a gp3 volume with some workloads running on it. At noon, you figure everyone's going to be at lunch, so you set your cron jobs or scheduled tasks to all fire at noon. They all launch a bunch of IOPS against the system. What happens? All those operations increase the queue depth. You get more queue of operations coming in. The result is you get more performance. Your IOPS goes up, and your latency holds constant. Your IOPS increases right up until you hit your IOPS entitlement. When you have enough scheduled jobs firing all at the same time to hit that cap, then your latency comes up in response.
Anytime I'm looking at a system and trying to figure out what it's doing from a storage perspective, I have that equation in the back of my head. It will make you seem really smart in front of your peers because you can just throw it out there. By the way, if you jitter your cron jobs to not start on the hour, then I won't see a huge spike in IOPS every single hour. That's my problem, just saying. But you could be a friend.
This slide reiterates the same thing. The queue depth, the IOPS, and the latency are all connected together. You can bring your performance up by increasing your queues up to your IOPS entitlement, and then you're going to start to feel latency. But what this means is that you actually have control of your latency. Even though the latency of the gp3 product is the thing that I hold constant, you actually have a lot of control in both directions. If you drive it too hard from a queue depth perspective, you will see the latency come up, and your instance will start throttling.
We actually have new CloudWatch metrics from a late October release that will tell you when this is happening. You can diagnose this yourself. There are also new CloudWatch metrics to manage throughput and latency, so we have that monitoring built in. You can look at it directly. The actual performance of the queue depth versus the IOPS and the latency is going to depend on your workload and your block size. As a general rule, you should expect when you're in the 16K range for block size, one queue depth per 1,000 IOPS. If you reverse the math on that, it comes to one millisecond latency. It's all connected exactly by that equation.
Monitoring Performance with Latency Histograms and CloudWatch Metrics
That is the theory—the basic theory of storage. It is absolutely true. But I think we know in practice that the average doesn't tell the whole story. In practice, we have to monitor across the range of performance characteristics. So how do we monitor? We have CloudWatch metrics and NVMe CLI. There are some new commands here I'll talk about in a second. The best tool that I have to monitor performance is actually a latency histogram. Again, this is not EBS-specific—it's just fundamental storage. I'm going to show you what a latency histogram looks like if you haven't seen it.
I'll walk you through the slide. This is an actual latency histogram. It is not internal EBS. It is something I pulled from my personal cloud desktop on one weekend day. Let's look at what my system is doing for the purposes of this presentation.
You'll see a pulse wave pattern. On the x-axis, that is your latency in microseconds. So 1,024 microseconds, that's about a millisecond. That's where you get the peak. On the y-axis, that's the number of I/O operations that fall into that latency bucket. This is nine times out of ten the best way to look at the latency of your system.
When I am dealing with a latency histogram, the first thing I look at is where my peak is. That's the typical performance. In this case, it just happens to be around a millisecond. It's about what I expect from a gp3 volume. I was hoping I would get something unusual in my graph, but I didn't. It's a normal behaving system.
I also look at how that pulse is skewed. If it's totally normal, I'm getting an even distribution of I/Os that are slower and faster. In this case, I'm skewed left a little bit. So the system is generally under a millisecond. Again, it's pretty good performance for a gp3 volume, especially one that's not under a lot of stress. Queue depth isn't that deep.
I think about how wide this curve is. That's the variability in performance. So here, there's some variance in performance. I'm getting four milliseconds. I'm getting one millisecond. I'm getting things that are completing in a quarter of a millisecond. It's pretty fast.
If this were an io2 volume instead of a gp3 volume, you can sort of predict what you'd see. You'd see that whole pulse skewed to the left because it's faster, and you would see the width narrow because it's a more compliant system. That's just a different lived experience.
When you have a graph like this, it's great to track it over time. You know about what your system's doing. You can track how that's changing over time. When you start to have a performance problem, what that will show up as is a new mode, a new bump, usually towards the side that we care about, that right side, or I guess would be left side for you. In the high latency band, when you see a bump there, that's a bunch of I/Os that are delayed by something.
So I want to talk a little bit about what causes those latency outliers that we see and how we handle them. Before I do that, I do want to point out this. This is actually EBS NVMe output. The tool that you can use to grab it is there. That's the tool I use to grab this. We also have CloudWatch metrics with this histogram now built in on Nitro instances, but you can also use basically any tool to monitor. We use IOstat extensively. Any tool that monitors performance you can use to build this kind of histogram.
Managing Latency Outliers: SSD Physical Limitations and EBS Mitigation Strategies
So let's talk about what causes these outliers. I have to go into how SSDs are built. I'm going to do it on a very shallow level. I can talk for an hour about this. If you want to learn more about SSDs and programmer race cycles, I would love to chat with you outside this session.
But essentially your SSD, whether it's your phone or your laptop or EBS storage, it's a physical device. It's a real disk. It's made up of blocks and those blocks have cells. Inside the cells, there are little MOSFET gates that physically trap electrons. They're super cool. But because they're a physical device, they're subject to the cruel whims of the physical world, which means things break. Those little gates can break.
And when they break, there's a bunch of failure modes for a physical disk. But one of them is that it can find that it's unable to reprogram one of those gates. And when it does, when it finds that, it'll retry a few times, but then it'll declare that it has a dead cell. And what the disk has to do with a dead cell is it's going to have to move information out of that block, mark the block as dead, and go onto a new block.
Now, if I pick a random server in our fleet, the odds are it doesn't have any of these dead blocks. But if I picked a couple of servers, I would see a few. It's normal, it's healthy, it's fine. But when that happens, when the disk has to take on the extra work of moving the data from one block that's been damaged to another, we will see latency events on the order of 10 to 100 microseconds. They can last a little while as the disk works through that workload. It's unavoidable. This is just a fact of living with storage. You have those outliers.
And if I had one or two of those, those could be consistent with a gp3 experience. But as a general rule, I don't want to just pass those upward to the customer because gp3 as a product has its own experience. You would be very unsatisfied with me if I said, "Well, I just failed, so." My job is to present you gp3, not physical disks.
So what we do is we monitor via those same latency histograms, those same throughput metrics. We monitor this very actively and have thresholds that tell us when we have a drive that is behaving outside of the normal operating envelope. What I do is basically the same as the disk or an analogous behavior to the disk. When I see this misbehavior, I route your data off to another server. I take that server in for repair, or sometimes it just needs a rest like all of us.
Effectively, by using that same technique that the disk uses, I can build a higher performance, more stable performance product out of less performant underlying hardware. That's a tool that works all the way up the stack, even on services built on top of EBS. To get into the details, we have to face another one of those really hard choices around storage. There is no one answer here. It's as varied as there are different workloads.
Handling High Latency Events Through Read and Write Hedging Techniques
Let's do a thought experiment. You get a request that goes to some EBS server. The EBS server writes it down. Let's say it's a write request. EBS is replicated for reliability and durability purposes. We reach out to the peer and say we would like you to write this data. And I don't hear back for 500 microseconds, which isn't that much time. It's really short, but for a computer system, it's actually a long time. So what do you do?
Most day one EBS engineers will say you could retry, you could time out, and you could try again. That's not necessarily the wrong answer. At EBS, it's the wrong answer because we are a very latency sensitive product. We want to hold that latency bar as tightly as we can. Retry is usually the wrong answer. Usually when a host is telling you that it's slow, it's because it's going to be slow. The right answer is to move past that host, find a new peer, and write the data there.
Think about read requests too. Retry is valid, but I shy away from it from a performance perspective because I care a lot about latency. I would rather use the throughput of my system to move the data away from a misbehaving peer. One technique we use a lot of is read and write hedging. Imagine a different scenario where the customer issues a read request and they want their data. I send that read down to disk. The disk is slow and takes a few hundred microseconds to come back. It's longer than I expect.
I can proactively issue another read in parallel to the peer that has the data. I don't know if my disk is coming back. It seems slightly delayed, but I'm just going to go grab the data somewhere else and take whichever answer comes back first. Read and write hedging are highly valuable. I can't tell you what the right answer for your system is, except to say that these latency outliers are a fact of life in all storage systems at any level.
I could build you a system where this doesn't happen. It's essentially io2 Block Express. It's more expensive, right? But a well-optimized system that suits most workloads, which is GP3, is going to have these outliers. What you can do is use a system called AWS Fault Injection Service. This year, we added EBS Latency Event Injection. So you can fire up Fault Injection Service on your own system. You can actually get a test environment out of a production system where this is a great use case for firing up some fault injection to see how your system behaves to those latency outliers.
Most systems will just tolerate them, especially in the write path. You issue a write and the write comes back later. A little bit of latency is fine. Sometimes when you're in a more transactional database mode, that write is the commit block with a bunch of I/O behind it, and you get what's called head-of-line blocking. That's a big performance issue. So understanding how your system behaves to those kinds of latency outliers is really critical in determining how sensitive you are to latency and what you do if you issue some kind of request hedging or give up on a peer when that happens on your storage system.
We talked about coupled versus uncoupled scaling, building your system to move in multiple dimensions and optimize each of them. So optimizing in the dimension of IOPS or throughput, optimizing in the dimension of capacity. We talked about using queue depth to control latency.
Manipulate performance, either bringing it up if the system is underfed with a small queue or bringing it down. Shorter queues give you a faster response time. We talked about monitoring latency using a histogram, some of the new CloudWatch metrics we have, and NVMe CLI, which is another tool on the Nitro instances. We talked about using the latency response—rather, choosing how your system responds to high latency really carefully and testing it with fault injection service.
How your system responds to those different behaviors is what defines the lived experience. It's that lived experience that gets you the GP3 volume type or the IO2 volume type. That's how we build that product separate from the underlying medium. When we're dealing with big surges in our workloads, we had Prime Day just recently hit new peaks for EBS, and through it all, we can maintain our SLAs of EBS performance.
This has all been workloads in a very static sense, fixed workloads. We're going to talk next about how workloads change over time. I'm going to hand off to Sapna.
Evolving Your Volume: Using Elastic Volumes to Modify Size, IOPS, and Throughput
Thank you, Doug. I love his passion for block storage and the way he explains some of the fundamentals of block storage is pretty exciting. All right, let's dive deep into how do you evolve your volume.
I'm going to take a step back first. Some of you already know what is the right performance for your specific use case, but some of you may not know or may not be sure about what the right performance level is for you. There are two options. One, you can start with a higher performing volume and then scale it down as you learn more about your volume, use case, and the performance that you need. Or you can start with a tighter performance envelope—for example, GP3 baseline performance—and scale it up as your volume demand grows.
Doug talked about how you can measure performance and went into some details of latency. All of those data points can help you learn more about what IOPS and throughput you need for your specific use case. GP3 is pretty impressive, especially with our recent launch of higher limits. It gives you a pretty broad spectrum. For example, with data size, you can start pretty small at one terabyte and then go up to 64 tebibytes. With on-demand power, you can now go up to 80,000 IOPS, and if you need that only for a short amount of time—for example, just for a few days or a few months—you can leave it at that level and then scale back down when you don't need it. That way you're not paying for performance that you're not utilizing.
Striping is a technique that some of our customers use to get the right data size and performance. But with GP3's larger and higher limits, you don't have to worry about the complexity behind striping. You can reduce and minimize that overhead by just increasing your IOPS and throughput independently.
Let's get into what is the mechanism for you to evolve or modify your volume. Elastic Volumes is the way to go. It basically allows you to dynamically increase your size, IOPS, and throughput independently. There are three actions that you can take. First, change volume type. What if you just want to migrate from GP2 to GP3 or GP3 to IO2? You can do that using Elastic Volumes. You can also tune your performance within the GP3 spectrum. You can move from 16,000 IOPS to 80,000, for example.
Size is something I want you to be a little cautious about because you can only scale up—you cannot scale back down. So if you really need a higher size for your volume, please keep that in mind that you cannot scale back down. What are the best practices? What are the different steps that you need to take to actually request a modification of your volume? One thing is snapshot. For some of you who may not be aware of what snapshot means, it is a point-in-time copy of your volume.
You can restore your volume from that specific snapshot. Once you take the snapshot, there are different ways of requesting modifications to your volume. You can ask to migrate from gp2 to gp3, or within gp3 you can request different IOPS and throughput limits. After that, you can monitor the progress of your modification, which I'll show you in the next slide.
I want you to be cautious about point number four. Sometimes EC2 instances don't recognize size changes. For example, if you're using a 100 gigabyte volume and then increase the size to 200 gigabytes, if the EC2 instance doesn't recognize that change, you're paying for 200 gigabytes but only utilizing 100 gigabytes. Extending the file system helps sync the physical disk space with the EC2 instance.
Here are a few methods you can use to request modifications. You can use the AWS CLI, PowerShell scripts, or go directly to your console to monitor your status. You can see the volume state and the modification state. Optimizing means you're transitioning to the new limits you've requested.
Let me dive deep into what happens behind the scenes when you request a modification. If you're curious about what happens when you request higher IOPS or throughput, it's pretty exciting to understand. I'm going to walk through a specific scenario. We're running at massive scale and receive hundreds of thousands of requests to modify volumes. We have to check every single request to determine where your volume was initially placed, say on server A. If server A has the right capacity to increase your performance, we proceed there. If not, we migrate your volume to server B so you get the performance and size you need. This is seamless from your perspective as a customer, but I'm sharing this so you understand what happens behind the scenes.
Let me discuss the different boundaries and challenges associated with EBS. Although we provide a wide range of options for gp3 in terms of size, IOPS, and throughput, there are certain limitations. One is one-way sizing. When you request to go from 100 gigabytes to 200 gigabytes, for example, you cannot go back to 100 gigabytes. There might be some workarounds, but you cannot do this directly, so be aware of that.
The second limitation is the six-hour timer. Between two modification requests for your volume, there's a six-hour wait time. This ensures your volume integrity and consistent performance remain intact. We've kept this boundary in place to guarantee that. The third limitation is related to what I mentioned earlier. Some older EC2 instances may not recognize the size you've requested, and you might need to reboot or run an OS command to ensure your EC2 instance recognizes the change you've requested.
VMware Carbon Black Success Story: Achieving 25% Cost Savings with GP3 Migration
I love sharing customer success stories. We've discussed how to evolve your volume and the mechanism to do so. Now let's look at an example where one of our customers, VMware Carbon Black, was able to optimize their cost and performance by leveraging gp3. Before I get into the details of the scenario, I want to quickly call out what they do and their scale so you can connect with this success story and potentially apply similar principles and strategies to optimize your own costs.
VMware Carbon Black is a leading industry player in cybersecurity, focusing on endpoint detection, application control, and next-generation antivirus. They support over 8,000 customers and handle millions of records per second. To maintain their customer reputation and provide the best application experience, latency is critically important for their use case. They need high-speed volume performance, which is where they are using Amazon EBS.
Here is a high-level architecture showing how they use multiple AWS services. My focus in this session is on EBS. They are using EBS for high-speed volume performance. They process millions of requests and records, sorting and compressing data within the volume using an EKS cluster. For this to work effectively, they need to read and write data frequently, which is why IOPS and performance are critically important.
Let me explain the challenge they faced and how they optimized and mitigated it. Initially, they were using gp2 volumes. As mentioned earlier, gp2 performance and size are coupled together. They wanted 3,000 IOPS, and to achieve that, they were stuck with two options. Either they could pay for an extra 960 gigabytes of storage to get the 3,000 IOPS, or they could use bursting performance, but that would not guarantee the consistent performance they needed all the time. They went with the second option, and that is where they struggled, experiencing throttling. Because of throttling, they also saw their cluster scale up with unnecessary EC2 nodes.
Let me show you how they resolved this problem. They moved to gp3 volumes, and they were able to get guaranteed 3,000 IOPS without paying for unused storage. With that migration, they directly saved 20 percent on storage costs. They were able to reduce the number of EC2 instances by 50 count. Altogether, they were able to save 25,000 dollars per month, which is a significant number.
Here is feedback directly from the customer. They also shared this graph showing that the number of EC2 instances decreased by 50 count after they migrated to gp3 and achieved consistent performance of 3,000 IOPS. This anecdote summarizes how the migration really helped them, and they were able to save several thousands of dollars per month. This is one story where a customer moved from gp2 to gp3 and was able to optimize both price and performance. You can also achieve similar results within the gp3 spectrum with the higher limits we offer today.
Optimizing Secondary Environments: Snapshots, Clones, and Cost-Effective Development Strategies
So far, we have discussed how to improve and optimize price and performance using gp3 for your primary environment. But what about secondary environments that you might be using for development and testing? How should you think about optimizing price and performance for those use cases? Let me share one scenario. Previously, when we did not offer higher limits for gp3, customers were stuck with io2 volumes when they needed higher IOPS, for example 16,000 IOPS.
In the case of secondary environments, unless latency and durability are critically important, you can now move from io2 to gp3 and pay much less for your development and testing. That way, you are able to optimize costs and avoid paying a premium for io2 when you are just experimenting with your use cases. With the new limits up to 80,000 IOPS, high-performance workloads can now run on gp3, allowing you to achieve production speeds in development and testing environments at a general-purpose price point. This is one scenario. Within the gp3 spectrum as well, you can do much more and strategize how you want to optimize your development environment.
Now, similar to your primary environment, I wanted to touch on how you can make a copy of your volume and how you can create a secondary environment without interrupting your primary production live environment. There are two methods to do so: one is snapshot. A snapshot is a point-in-time copy of your volume. You take a snapshot and then you can hydrate or restore your volume from that snapshot you have taken.
Since we are talking about cost, price, and performance, I wanted to educate you on how EBS snapshots work and how behind the scenes we optimize price and performance for you by avoiding duplicated data. In state one, for example, you have a 10 gigabyte volume. If no snapshot exists, we will first take the full snapshot. In state two, let's say you made a change of four gigabytes. In that state, we are not going to take the full snapshot again. We are only going to take the snapshot of four gigabytes and refer to snapshot A for the six gigabytes of data that you have not touched. That way, we are not duplicating data, and you are not paying multiple times for the full snapshot.
In state three, let's say you have added two gigabytes of data to your volume. Now in state three, we are only going to take the snapshot of two gigabytes, refer to snapshot B for four gigabytes, and then refer to snapshot A for six gigabytes. So you are only paying for the incremental backup here. That is what I wanted you to take away to ensure that we are optimizing the cost for you.
This works even if you have hydrated or restored your volume from the snapshot. In this scenario, for example, if you have a 10 gigabyte volume and you have taken the snapshot and restored your volume, let's say volume two, and then you have made some changes of four gigabytes, in that scenario we are only taking the snapshot for four gigabytes and maintaining your lineage by referring to snapshot A for the 10 gigabytes of data that you have not touched.
Some of you might be thinking that with snapshots, you take a snapshot and then hydrate or restore, which involves two steps. What if you want to optimize the performance of restoring or hydrating your volume? Sometimes you may want it immediately because you have critical use cases. So we offer three options here. Depending on your use case and how critical your application is, you can pick and choose. If you care more about cost, you can go with the standard restore, which is a free tier and you do not have to pay extra. But if you need predictable performance or how soon you can recover the volume, or if you need to see your volume immediately, then we have two options: provisioned IOPS, where you can pick the speed at which your volume is provisioning, and FSR, where you get instant access to your volume. Those two options you have to pay for, and that is where you have to think about the strategy of optimizing cost and performance.
Especially for the dev test use case where you just need to create a copy of your volume, there are certain limitations, especially if you need an instant copy of your volume. Like I mentioned before, snapshots have at least two steps. You need to first create the snapshot, and then you need to restore your volume from that specific snapshot. Your copy may not be instantly available depending on which option you are picking. Restore is definitely required.
This is where we have recently launched clones. If you are looking to instantly copy your volume and want to just start with the testing, with one click of a button you can do so. One API call gives you an instant point-in-time copy with no restore required. So how does this clone work?
We have an EC2 instance. There's also a volume attached to it. Let's say in this scenario, IO2. If you want to optimize the cost here, as I mentioned before, you can actually choose GP3 when you're copying your volume. That way you're not paying for the premium IO2 volume. Instead, you're just paying for the GP3, which is much cheaper. On the console, there's one option. You can click a button and just create a copy.
All right, we have reached towards the end of our session. I'm just going to quickly summarize the last two topics that we talked about. GP3 gives you a pretty large range of size, IOPS, and throughput. With Elastic Volumes, you can evolve your volume. You can migrate seamlessly from GP2 to GP3. You can fine-tune and right-size your volume depending on your use case and what you've learned about your data by monitoring performance. You can modernize your architecture. Now we offer up to 64 tebibytes of size limit with GP3.
The great thing about Elastic Volumes and GP3 is we are doing this in place. You don't have to worry about stopping your instance. You don't have to worry about detaching your volume. All of that is happening behind the scenes. For you, the live and production data volume is not interrupted. We have about 11 sessions at this re:Invent lined up. We're done with four of them, so I would highly encourage you to look at other sessions if you want to learn more about Amazon EBS and EBS snapshots.
Snapshots in particular, if you want to learn more, we have STG325. We have STG326. I'm co-presenting STG406 on Wednesday if you're interested in having a hands-on experience and learning more about EBS snapshots. You can come join us there. With that, thanks a lot for staying here. We would love to hear your feedback. This is how we evolve our content and make sure it's relevant. So please take the survey and let us know how you like the session and enjoy the rest of your re:Invent. Thank you.
; This article is entirely auto-generated using Amazon Bedrock.














































Top comments (0)