Kazuya

Posted on Dec 4, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Maximizing block storage performance for high-intensity workloads (STG319)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Maximizing block storage performance for high-intensity workloads (STG319)

In this video, Mark Olson and Jody from the AWS EBS team demonstrate how to maximize EBS volume performance through a fictitious healthcare company scenario. They explain the differences between gp3 and io2 Block Express volumes, emphasizing io2's sub-millisecond latency (under 500 microseconds) versus gp3's single-digit millisecond latency. Key topics include using AWS Fault Injection Service for chaos testing, leveraging detailed performance statistics with EBS NVMe stats for per-second metrics and latency histograms, optimizing database configurations with 16KB atomic writes to eliminate double-write overhead, and troubleshooting performance issues using CloudWatch metrics and burst balance monitoring. They also introduce the new R8GB instance with 720,000 IOPS and explain how the Scalable Reliable Datagram (SRD) protocol and hardware offloads improve network efficiency. The session includes practical debugging scenarios, Elastic Volumes modifications, and applying Little's Law to understand queue depth and concurrency impacts on storage performance.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Taking EBS Performance to the Next Level

Good morning. I was walking here today and took a picture because the sunset was beautiful. I realized I can be a pretty good photographer with just the camera in my pocket. But to be a great photographer, I need to know a little bit more. Today, we're going to spend some time taking EBS to the next level and helping you understand how to maximize the performance of your EBS volumes. I'm Mark Olson, a Senior Principal Engineer on the EBS team, and I've been thinking about storage and learning from our customers since 2011.

You may look at my title and think it sounds fancy, but the neat thing about being a Senior Principal Engineer at Amazon is that I still get to write code and I carry a pager. I'm on call and in the trenches with you, understanding exactly what's going on with the system. I get paged quite a bit because I'm on a few different on-call rotations keeping track of the entirety of EBS. Hi, I'm Jody. I lead product management for EBS as well as a handful of other products like AWS Backup, AWS DataSync, and AWS Transfer Family. I also carry a pager, so I can relate to a lot of the scenarios we're going to walk through today.

We're going to use a couple of scenarios today to walk you through things that you can do both beforehand as you're planning your application to make better choices and eliminate surprises that could happen in production later on, as well as a bunch of monitoring tips and tricks using some of the newer features we've shipped over the last year or so. These help you when something bad does happen to isolate the problem, figure it out, and recover as soon as possible.

Setting the Stage: AnyHealth's Infrastructure Requirements

We're going to walk you through a fictitious scenario where we are now two employees of the fictitious company AnyHealth. It's a medical device manufacturing and healthcare software company. What that means is durability is of the highest importance because this is medical information and it's highly regulated, so we need to do a lot in our infrastructure for resiliency and compliance. Mark is the software engineer who is building the application and is on call this week. I had to do a really complicated three-way on-call trade-off in order to not be paged on stage today. I'm not on call because I traded my favors and got off call. I'm the infrastructure admin helping Mark make infrastructure choices.

I'm going to go through a couple of introductory items because there are always a few folks who aren't as familiar with the portfolio. We are block storage, right? Object storage is S3, file storage is FSX and EFS, and so on. We've also got other services like AWS Backup, which coordinates EBS snapshots as well as backs up all kinds of other AWS services. If you haven't taken a look, it's really worth checking out because it's a great service with a lot of regulatory compliance features that pure snapshots don't have.

On the other side, you've got DataSync, which is a product that helps you move data between on-premises and AWS, like between clouds. It's something we've seen a lot of use for with AI workloads blowing up recently. And then if you're on the edge, we've got AWS Outposts. So you use volumes for persistent network-attached storage for EC2. They're independent of your EC2 instance. You can attach them to any EC2 instance in the same availability zone. The biggest difference is when you stop and start your instance, your ephemeral storage goes away, but your EBS volume stays put and you can just reattach it to another instance.

Within the storage portfolio, we've got a bunch of different products split into SSD-backed products and HDD-backed products. For today's talk, I'm really going to focus on our primary two SSD volume types: gp3 and io2 Block Express.

New EBS Data Services Features: Provisioned Performance for Data Movement

In addition to core volume snapshots, we've been putting a lot of work recently into our data services features. Historically, many EBS data services features worked on a best effort basis, making it very difficult to predict exactly when a data movement operation would complete. We started investing in more explicit performance where you can actually provision the performance you want for those services. First, we introduced a fast hydrate feature that we named provisioned rate for volume initialization—I'm trying to change that name, but for now that's what it is, or PRVI for short. With this feature, you can actually set a throughput rate. You can say that when you're creating a volume from a snapshot, you want it done at a specific throughput level so you can know exactly when it's going to finish.

Similarly, we have a time-based snapshot copy feature that does the same thing but with time. You can say that for your RTO purposes, you need this backup to complete within a specific number of hours. You can set that, and then the snapshot copy, which previously might happen really fast or really slow depending on various factors, will happen in exactly the timeframe that you need for whatever compliance or internal targets you've been setting.

Planning Your Infrastructure: Choosing the Right Volume Type

Now I'm going to talk a little about planning your infrastructure alongside your application and the kinds of things you can do upfront to make the best choices. The first thing you've got to do is understand the kind of workload that you're going to be deploying or building. Before you start building, think about what it needs. In this case, for our patient medical records application, we're going to need a transactional database. We're going to build a relational database that needs super high performance and very high durability. For that, we're going to be looking at io2 because that is a volume that has both the maximum performance, the lowest latency, and the best latency consistency, as well as the 99.999% durability that we need.

In other cases when that's not your objective, you can use different volume types. Another useful thing to do with EBS is that you have the flexibility to break up your workload and put different volume types or provision the same volume type but with different amounts of performance for different parts of the workload. For example, if you're running a relational database, you could put the journals on io2 or a more highly provisioned gp3 volume. Similarly with Cassandra, you could do that for commit logs, or even with Kafka, you could use something like one of our HDD-backed volumes for topics, for example, because that involves a lot of sequential I/O.

io2 is our high-performance volume with 256,000 IOPS and the 99.999% durability I talked about. Recently, we launched larger and faster gp3 volumes. We took performance from 16,000 IOPS to 80,000 IOPS. We doubled the throughput performance to 2,000 MB/s, and we increased the maximum volume size from 16 to 64 terabytes. This means that especially if you're running containerized workloads where you can't RAID together volumes, now you can just grow one big huge volume as you need it, which allows for much more flexibility.

We also updated our public-facing latency guidance this year, which is something we don't do that often. io2 has an average latency of under 500 microseconds and has 10 times fewer outlier I/Os than gp3, for example. Let's take a second and look at the performance definition of these two volume types because it's easy to get lost in the numbers. gp3 is designed for single-digit millisecond latencies, which means 99% of the time anything from under 10 milliseconds to 10 milliseconds of latency is within the definition. In practice it way outperforms that, but that's the definition.

IO2, on the other hand, is designed for sub-millisecond latencies 99.9% of the time. That's an order of magnitude difference in latency consistency. You can see that as you start to look at outlier IOs. Both GP2's range for IOs is much bigger. This graph will break your brain because the bottom is in log scale, but what we really wanted to show was as you go up from 99 to 99.9 and then later to 99.99%, that's where the latency outlier difference is between GP3 and IO2. We have to treat the products very differently to make sure that IO2 stays within that very narrow band.

Using an analogy, I was trying to think of a job that's really bad if you get to work late. An air traffic controller came to mind. Air traffic controllers are on pretty narrow shifts and they have to show up to work on time. Minutes count, and if you're multiple hours late, that means you've missed a whole shift, and things are going to go very poorly. When you're trying to figure out a way to get to work, suppose my air traffic controller takes the train. You have to decide whether you're going to take the GP3 train or the IO2 train.

GP3 is going to be on time as a commute. The GP3 train is on time 99% of the time, so you're going to be late once in 100 days. With the IO2 train, you're going to have 99.9% consistency, so you're going to be late once in 1000 days. Your arrival time range for GP3 is much bigger than IO2. For GP3, you've got a range of about 20 minutes, where you're going to arrive every day. You have to do some planning beforehand to make sure that no matter where you land in that range, you're going to be able to get to work on time. With IO2, it's staying in a very narrow band of only about 2 minutes. You don't really have to go out of your way to do much for it.

With GP3, you're more likely to have a bigger delay. That's where the 99% versus 99.9% really comes into play. With IO2, not only on an everyday basis are you staying within a very tight millisecond range, you're also way less likely to have a bigger delay. If you don't know which volume to choose when you're starting design, just start with GP2. You can do some testing to see how that works out for you. Use our Elastic Volumes feature, which Mark will talk about later, to modify your volume if it isn't the right choice. That's a good no-brainer.

Testing Strategies: From FIO to AWS Fault Injection Service

I'm going to talk a little bit about testing. These are the things you can do beforehand, before you've actually launched your application. You can do some testing so that you know upfront, before your page at 2 o'clock in the morning, how your application is going to respond to different scenarios. Generally speaking, the higher up the stack you get, the better. You want to be as close to your application as you can to simulate the things that happen and then what happens with the specific interaction of your infrastructure and application.

At the lowest level, there's FIO testing, which is a flexible IO generator. It's just testing raw IO and it's great for testing your maximum performance limits. If you say this volume can do 16,000 IOPS, you can check yes. It doesn't really tell you anything about your application, but it's good for that purpose. If I compared it to driving a car, it would be like driving a car on an empty racetrack. It says you can do 100 miles an hour. I go all out, yes, if I'm brave enough, which I'm not, yes, I can.

Another type of testing you can do is TPC-C, which is a standard OLTP database benchmark.

There are lots of different benchmarks out there that can simulate the typical patterns of the workload that you're working with or the database you're working with. For TPC-C, that's simulating things that typically happen under high load for a transactional database. Think like order entries, payments, deliveries, that kind of thing. It has a ton of tuning parameters, so you can mess with it and make it do whatever you want. But it shows you realistic read, mixed read-write IO patterns that will show how your application will respond to a typical database-like transactional workload.

So it's getting closer there, but ultimately you're not really getting to real, real testing until you're starting to do some load testing on your application. You're looking at your own specific application and all its infrastructure and seeing how it responds to the conditions of high traffic that you feel is important to test. What's great about that is that it can reveal real bottlenecks, real retry storms. You're generally testing full end-to-end behavior, so that's something I know we all have such compressed cycles and we're trying to ship features and get products out the door. But doing a really thorough job on your load testing can save you a lot of misery later on in the game.

And then lastly, we've got AWS Fault Injection Service, and I'll talk about that one a little more because we did some new stuff with EBS here. FIS is a service that allows you to test your application against specific worst-case scenarios of your own devising. Here you're no longer measuring just performance, you're measuring your resilience, graceful failure, and recovery under controlled chaos. This is a really helpful tool to use.

The thing that is particularly useful about FIS is that when you're doing all of your other testing, you're doing it under fairly normal conditions. Maybe the traffic or the load is high, but everything else underlying is basically functioning as expected. FIS allows you to simulate an infrastructure bad day. These are things that aren't easy to test for because they don't happen very often, but when they do happen, it might cause a lot of problems with your specific application and all your infrastructure.

If load testing is like driving your car through rush hour traffic in your own city to figure out what impact that'll have on your commute, then FIS testing is like testing against a horrible situation that you choose. Maybe there's a highway closed or there's a massive hailstorm, or there's lots of fog. You can pick them out and test against them. FIS works for not just EBS. It tests for everything. I'm just talking about the EBS-specific actions because those are relevant to our presentation.

In September, we launched some new FIS actions for EBS. We launched latency injection that can simulate degraded IO performance on your volume to replicate real-world signals. So you can see here how your CloudWatch alarms work and whether things are working. You can see OS timeouts. You can basically see the stuff I've baked into my application and the different alarms and rail safeguards I've set up. Are they actually working when something really bad actually happens?

They've got some predefined templates in FIS and the EBS consoles that you can see. They've also got customization. So if you don't know what to use, you can just start with one of these, but then you can actually customize them. If you're a super advanced customer and you know storage really well and you know the kind of things that can go sideways, then by all means have at it.

You can do things like change the percentage of IOs in which your latency action will be injected. That's how often this is happening among all of your IOs. You can change the amount of latency. For IO2, you can inject a minimum of one millisecond latency. For non-IO2 volumes, you can inject a minimum of ten milliseconds latency, and then that could go up to sixty seconds, so a full minute of, let's say, just stalled IO, right.

You can simulate a full minute of stalled I/O, such as a stuck volume, and observe what happens to latency. You can decide whether the issue will be persistent or intermittent, and you can split the impact between reads and writes. This tool allows you to easily test many scenarios to understand what happens during rare bad days on your infrastructure. You can also set the duration for these simulated issues, with a minimum of one second and a maximum of several hours, so you can see how your system behaves over extended periods without actually experiencing the problem yourself.

Configuring OLTP Databases: Atomic Writes and Double Write Protection

I'll now transfer this to Mark, who will discuss OLTP databases. We've completed benchmarking and testing, and we've determined that our medical records application will use a couple of different databases for patient information. We're going to put patient data into a traditional relational database, and for this we'll choose something like MariaDB. You could just as easily have chosen MySQL, PostgreSQL, or even a commercial database like SQL Server or SAP.

We've determined that this application runs best on instances that have a four-to-one ratio of memory to CPU, which fits into the M class of instances in EC2. We're going to use Graviton 4 to achieve even better price performance than previous generations of Graviton. Our database is replicated at the database layer, and one of the nice things about having application-level replication is that there's much more state and knowledge of what data is actually being transferred, allowing the application to make smarter decisions and avoid blind transfers of data that don't really matter.

As we choose our storage, we're going to split up a couple of different things. We have a write-ahead log, which for databases typically has lower queue depth but requires lower latency. We're going to use an io2 volume for that to achieve nice low latency. For our data volume, we don't have high pressure for this particular part of the workload, but we know the requests will be large, so we're going to use gp3 and provision extra IOPS and extra throughput.

Databases are interesting, and I'm going to pause here to talk about how I/O works in the stack. You don't need to memorize this image—it's quite complex and looks like a Rube Goldberg machine, and it's also a bit outdated—but the important thing to recognize is all those little orange snake-like things, which are queues. In any storage system, even at the raw device level, there are queues in the storage system.

EBS looked something like this about five to six years ago on Zen instances, and there were many queues in the stack. When your application wants to submit an I/O request, it executes a system call and puts that request onto a queue. The system call is typically picked up by a kernel file system, which maps the request to locations on the disk drive. Whether it's a mechanical hard drive with a swinging arm or an SSD, disk drives have an allocation unit typically called a sector, and they also have a maximum transfer size.

EBS, as a virtualized storage system, has different limitations and doesn't necessarily expose what the actual media is doing, so it may have a different transfer size and allocation unit than the end device might see. This is important to know because as your request gets populated through all these queues and different parts of the stack, it gets split up into smaller requests than the original. These sub-requests may be placed on different devices or, at the very least, on different chips within an SSD. Everything is then merged back together as we come back to the top of the stack.

and the IO is returned. Now, why is this important for databases? Databases are really concerned about storing your data durably, and one of the things they will do is make sure that your data is actually written. They know that things can be written multiple times to different media chips, so one of the techniques is what's called the double write, or writing it a couple of times to ensure that at least one of those is going to be good and that they can replay it later if there's a power failure. If your storage IO request is only partially written, you are going to get what's called a torn write.

Now in Nitro EC2 instances, both EBS and local instance storage support much larger atomic units. Typically, an SSD will advertise that its atomic unit is 512 bytes. There is not a whole lot you can fit in a single 512 byte sector. This maximum size is advertised through the NVMe device model for both of these. For EBS volumes on all current Nitro instances, that maximum size is actually 16 kilobytes, which fits nicely with what a typical page size is for a database.

With the right file system configuration, you can do EXT4 aligned at the start of the device using Direct IO and Big Alec on the EXT4 file system. You can probably disable the double write protection. I have to be careful to get this configuration exactly right because if you do get it wrong and there is an infrastructure failure, you could end up with corrupted data. But with RDS, we have worked with them to enable this on their database platforms, and they have seen a 30 to 35 percent performance gain because you are not writing twice and eating some of that write performance.

Debugging in Production: Resolving Database Backlog with CloudWatch Metrics

I am on call. I forgot to hand off my on-call duties, but we are here, so let us go ahead and dive into this problem. We are going to debug this one. We have page four, a database backlog. In the interest of time, I am going to skip a few steps, but really what I would have started doing is I would have gone to CloudWatch, and with CloudWatch I probably would have used Application Insights or some of the observability tools that are available there to help figure out exactly which instances and resources are causing trouble. We are going to assume that we used that to find the primary database. We found the primary database of our patient records as the culprit of this page.

Let us walk through a few metrics examples that can help us find the problem. The first thing I am going to start with when I look at an instance is the instance status check. This is a roll up of all the infrastructure problems that could exist in your instance, and this basically says is this my problem or your problem? Is this an AWS problem or is this my application problem? The status check failed is the highest level. These metrics are going to be either zero or one. Zero is good, one means something is wrong. Status check failed just says there is something wrong that we have detected with the infrastructure.

If that is high or one, I will dive into some of these other metrics. If I look at the system, that tells me if the infrastructure that hosts my instance failed. This one is low. If it is high, some of the things that I can do and think about are I can stop-start my instance, which will end up placing that instance on a different piece of infrastructure. If I thought ahead, I would have enabled EC2 auto recovery. EC2 auto recovery will do that stop-start on your behalf. Basically, your instance will look like a reboot. Everything will come back on the other side.

Now I might look at the attached EBS metric that says is there a problem with the EBS infrastructure? This is usually not something that is going to be solved with an instance stop-start. EBS is going to be working on the problem to solve it. But if you need to make progress faster than we can, there are a few things that you can do. You can fail over to a replica, you can potentially create a new volume if you have a backup either in the same availability zone, or if it is a larger scale failure, you can create a new volume from a backup in a different availability zone.

And then the last one that I have listed here is instance liveness, and this is us detecting if your operating system is actually behaving from our perspective. We look at things like whether you're looking at the network card and actually pulling the devices and paying attention to them.

All of these are low, so it doesn't look like a hard failure. Since we saw a backlog of requests, let's take a look at some of the performance metrics.

Earlier this year, EBS launched a couple of metrics that do some math for you. Previously, we had metrics that told you the total read operations, total write operations, the amount of time during reads, and the amount of time during writes. If you wanted to know your average IOPS or your average throughput, you had to actually do the math, and sometimes it was clunky and didn't work out right. So we launched these average throughput and average IOPS metrics. These are available on a volume that's attached to an instance, which makes sense because if it's not attached to an instance, it's probably not driving any throughput.

Looking at the data drive and our write-ahead log drive, both on the throughput side, the write-ahead log looks pretty stable, which is what I would expect since it's just writing requests. On the data drive, we see a little bit of a dip and then a spike. That's interesting because we've got a change in our workload behavior. Maybe something wasn't communicated to us, maybe we onboarded a new customer, or maybe somebody was doing load testing accidentally on production, which I wouldn't recommend. But right where the vertical line is, which is where we got paged, we see a drop-off. That's interesting because we know that IO2 volumes and GP3 volumes are supposed to have consistent performance, so I'm not going to suspect them just yet. I'm going to suspect something that might have some variable performance.

If you recall, on the instance, I noted that the M8G 4 extra large has something we call burst capability. All EC2 Nitro instances are EBS optimized by default, which means they have some amount of dedicated performance of IOPS and throughput available for EBS. It's important to pick the right size instance so that your configuration, volumes, and instances live in harmony. Here we've got a total of 40,000 IOPS, but we weren't expecting to actually use that. It was just provisioned just in case. Our ability to burst on this instance with a baseline of 20,000 IOPS and then burst up to 40,000 IOPS for about half an hour once a day, or across the entire day, would absorb peaky workloads. Similarly, the throughput bursts as well. But our metrics didn't actually show that we were peaky. We had kind of sustained things, so maybe this wasn't the right choice, or maybe whoever was running the test should have let us know ahead of time.

Burst instances also have another CloudWatch metric that you can look at that tells you the IO balance and the IOPS balance and the throughput balance. Very similarly, we saw the throughput metric drop and then go all the way to zero. This is the available burst balance for throughput, where IOPS recovered because the IOPS workload isn't actually driving a whole lot of throughput. Once the throughput stopped, our IOPS was able to recover a bit. The other thing that we just launched earlier this year is another status check based on those burst bucket limits, so you can plug this into auto scaling groups if you want to increase the size of your instance or maybe scale out your fleet if that's how your application is designed to support scale out. They give you an edge trigger that you can alarm on, and instead of trying to calculate a rate and pay attention to a rate and understand what that is.

Putting these two metrics side by side, we can start to see the problem. We've got a throughput-driven workload. The throughput really bursts as soon as we run out of that burst bucket, and that workload drops off to the baseline of 625 megabytes per second. It looks like we don't really have that spiky load that we were planning for. So we probably should increase our instance size. In this case, we can go up to the next instance. The M8G 8 extra large has double the throughput and double the IOPS in the baseline, and there is no burst on M8G 8 extra large, so we don't have to worry about running out of that burst.

Part of the reason we added burst on smaller instances is so that you could size down if you knew you just had a few spikes here and there and didn't have to provision for peak performance. But you're probably thinking, great, you know your instances, so you could just pick one out of thin air. How do I know which instances to choose?

There are a couple of ways that you can get that information. I added a note at the bottom that it's all in our documentation. You can read a giant table, but I actually find it more helpful to look at the DescribeInstanceTypes API. The DescribeInstanceTypes API gives you much more than just EBS performance. It gives you a lot of information about the CPUs available, the network throughput available, different CPU configurations if you have licensing requirements to worry about, and how much memory is on an instance. It's a giant JSON blob for every single instance type, and then you can filter it down by the particular ones you're interested in.

I quickly ran that query and then piped it through a JQ query string, which looks like a JQ query string. I don't know who designed that, but that's fine. So you can get EBS optimized baseline throughput here, and I look for everything that's greater than 1200 megabytes per second. You can also get EBS performance specifically, including baseline IOPS, burst IOPS, and burst throughput. You can use JQ to filter down to what you're looking for. This is how I picked the 8 extra large version. If I wanted a little more headroom, maybe I'd go to the 12 extra large. I think I'm pretty safe though. I talked to my friends, and they were just doing a load test that they shouldn't have.

Once you've identified the right instance type, how do you change it? If you've just done a run instances command on the CLI without any sort of auto scaling or other infrastructure management, the thing you're going to have to do is stop the instance, change the instance attribute, and restart it. Now you can script this, so it's kind of a long reboot, but unfortunately we can't change the number of CPUs and the amount of memory while an instance is running live. We need to do it across an instance reboot.

If you have launched it as an auto scaling group, you can just update that launch template and then update your auto scaling group to use that launch template. Then do an instance replacement refresh workflow. Make sure you choose launch before terminate. Launch before terminate is really important. The other option is to terminate before launch, and then you've got a completely dead period as opposed to just a slow period. Launch before terminate will make sure that your instance is available and alive with any health checks you've set up before it terminates the old one.

As part of the stop and start process, the other important thing to remember is that unless you're behind a load balancer, all your storage devices are going to remain the same. But if you did not assign an EIP or you're not behind an Elastic IP or you're not behind a load balancer, the IP address of that instance is going to change. So a best practice here is to make sure you've got something stable that you can reference that instance with.

The Evolution of EBS Performance: From Standard Volumes to R8GB with SRD

Now on the topic of EBS optimized instances, one of the neat things about us being a storage service as opposed to something you buy and put in your data center is that we're continuously innovating on your behalf. While I'm here telling you how to improve the performance of your application, we're also doing the same thing behind the scenes. If we look through the history of EBS, back when we launched in 2008, this is kind of fun to look at. Our first volume type, the only volume type, was just called EBS Standard, and I think we actually named it Standard after we launched because we didn't have to have a name at first.

It was about 100 IOPS, kind of shared and clunky. It was hard drives with really no quality of service. The instance performance was also shared, so if you overdrive your storage, you might not have enough performance to get that data out the network. In 2012 we launched Provisioned IOPS volumes. It was cool back then. The first Provisioned IOPS volume had 1000 IOPS, and at the same time we also launched EBS optimized instances as an additional option to separate that performance from your network. Those EBS optimized instances had 8000 IOPS.

When we launched Nitro, we went up to 80,000 IOPS and EBS optimized by default. You no longer had to change and select EBS optimized. You're always going to get it regardless of whether you selected it or not. More recently, last year R6IN had 400,000 IOPS, so we're starting to see significant performance improvements over time.

But even that wasn't enough. So just a few weeks ago we launched the R8GB, which actually gives us quite a bit more. We've got 1.5 times more bandwidth and 1.8 times more IOPs, so it'll do up to 720,000 IOPs and 150 gigabits per second of EBS bandwidth. So if you really need that high performance instance, if you've got a large scale up workload or scale up database like a lot of commercial databases are, this might be the instance for you. So if we put it all together on that chart, how did we do it though? It's a pretty big leap to almost double the performance, right? It's not something that you can probably tweak in software. So we're going to go back to the slide before I got paged. Like I said, this is what it looked like before Nitro. Some of the things that we're going to focus on here is all the steps that it takes to get your IO to the storage server and then back, right? So we've got a number of queues. This is pre-Nitro, so we've got a software-based EBS stack, so there's even more queues. We've got a software-based EBS driver on the instance hardware, so even more queues. The network of course has queues, all those little yellow snake things like I mentioned before.

Now remember this picture. I'm going to remove the queues from the picture just to simplify it a bit, right? And I'm going to take a look at what Nitro looks like. Maybe I simplified it too much, but that's fine, we can have the conversation. So there's still queues. I just hid them. The important thing here is if you start with the device in Nitro, a Nitro instance or a Nitro card presents EBS as a PCI device in your instance, and this is PCI pass through. So the hypervisor is out of the picture. We've removed some queues there. The way that this works is the Nitro card has a DMA engine. That DMA engine also does encryption, right? So as we pull the IO data out of your guest instance, we will encrypt that payload.

And then there's another DMA engine on the Nitro card that will put it onto the network, so we kind of bounce through the Nitro card pretty quickly and briefly, kind of stream it onto the network. It doesn't really sit there for any measurable amount of time. We get to the network and we kind of do the same on the other side on the EBS storage server. We dequeue it, put it into the software, do whatever we need to do on media. If it's a read, we'll just look at the local SSDs. If it's a write, maybe we'll do whatever replication we've got. We've got some caching and things like that, but not caching in the right through sense where your data is not actually persistent. We cache for reads more than we cache for writes, and then we populate the request back into your instance.

So this is all actually pretty efficient in Nitro. We've got some hardware offloads that support this, but it really wasn't this way. The thing that's interesting to zoom into here is the network, and the network is where we can take a lot of liberties because we own that infrastructure and we don't have to present it to anyone else, right? So we can do whatever we want. So when we launched EBS and even into the early Nitro instances, our EBS storage fabric was largely TCP based. It was pretty optimized, but it was TCP based. So our first step was to improve that. If you've been paying attention for a while, you've probably seen this slide before. If you're not, I'm going to go through it pretty quickly. One of the things that we did is we built our own transfer protocol for our data center network. We call this SRD. Today this runs underneath every EBS volume attachment in EC2 Nitro instances.

So our design goals for SRD, we took a look at what TCP did and how we built our data center networks. There's been quite a few runs at making TCP efficient in a data center environment, but none of them really worked for us. As we stepped back, we questioned our assumptions, we took a look at our requirements, and realized that part of the problem we were having is that TCP actually did more than we wanted it to do. If we put more of the logic in some of the higher level applications, so like I mentioned before with database replication, your application has more context about what needs to be transferred. We're doing the same thing. We put more of the protocol context into the EBS overlay, so SRD could actually be a pretty great multi-purpose transport not just for EBS but for VPC networking and a few other things as well.

Now that we've done this, this freedom allows us in SRD to route packets using multiple paths through the network. For the case of EBS, we can route every IO request through a different path. Once we get onto the network, there are multiple paths to get to any endpoint. It's called equal cost multipathing. The neat thing about storage is that while we don't want to reorder the data within an IO request, we can take advantage of the fact that anything in flight in a queue can be completed in any order. It's just a queue because that's how we get it onto the device. It's not a queue for any ordering perspective. As long as we complete the IO when we say we're going to complete it, we can complete the second IO before we complete the first IO that's on the queue.

This is really cool because we can send every single IO request down a different path and constantly probe the network, looking for failures and reacting quickly to those failures and routing around them. Now in a data center, if you look at how we built data centers twenty years ago, you might have some sort of routing protocols that would have to converge whenever there's a device failure, and it might take multiple seconds depending on how big your network is. With this, we can route around failures in a matter of milliseconds.

So what makes R8GB even higher performance? We knew early on in our journey with SRD that it was going to be pretty powerful for us. But we weren't really satisfied with what we had built. We loved it. It allowed us to run our network more efficiently, things got faster, and things got cleaner overall in our network. But we knew we hadn't yet done enough. We knew that the next step was to take more advantage of hardware offloads.

Now, if anybody's done anything with hardware, it takes a little bit longer than it does with software. With software, I wrote a bug yesterday and I can fix the bug today and deploy that fix. With hardware, if you've got a bug, your cycle is months because you've got to wait for the next spin of the hardware. Maybe you can do a revision on the existing hardware, which is a little bit easier than a full tape out of a new chip, but it does take a while. We've been planning on how we can use SRD and more hardware offloads within SRD for quite a while. With R8GB, we're finally able to do that.

Remember how I mentioned that the Nitro has kind of done that double hop thing? With R8GB, we no longer need to do that. We have one DMA engine that can pull your data from the instance, encrypt it, and send it out on the wire to the EBS storage server. On the EBS storage servers, we're now also able to steer those requests directly to the CPUs that are responsible for handling your volume data. So we got more efficient both on the instance side and on the storage side. That's what gives us the ability to launch R8GB with more IOPs and more throughput. The cool part is we're just beginning down this journey.

Managing Cluster Databases: Using Detailed Performance Statistics and Elastic Volumes

Well, we've got another page. This time it's our application DB. With this one, this is where we're storing a lot of the doctor's notes and imaging and things like that. We've decided to take a different database. This is more of a hybrid database like TiDB. TiDB is a cluster database, and there's a whole class of these cluster databases that rely on quorum technologies to both be able to shard out and also absorb any performance spikes.

The key differences in how we've configured this: we chose the same four extra large instance. This time we're managing this via EKS, which is obviously a more modern way to manage our infrastructure. For the storage, since we're scaling out, we're just doing gp3 volumes that have four hundred megabytes per second and four thousand IOPs, and then we're just going to put a whole bunch of these in the fleet.

So instead of going through CloudWatch, I'm going to go through something that we launched last year, which is our detailed performance statistics. These metrics can give you some of the same information that CloudWatch gives you. In fact, they give you a little bit more information than CloudWatch gives you. One of the really cool things about these is that you can pull them every second as opposed to every minute, and they're available right on your instance. All you've got to do is run the EBS NVME stats command. You can query the device directly with NVME CLI as well or with direct DIOs if you really want to get detailed, but the EBS NVME script wraps it all nicely together for you so you don't have to think about it.

For those of you that are familiar with tools like iostat, these are just counters. They're counters since the volume was attached or the host rebooted, and there are a few times that they'll reset. You'll want to pull it and then look at the difference between those pollings. It's just going to keep incrementing. EBS NVMe gives you kind of an iostat-like interface where you can set an interval and a number of times to pull it.

Here we've got the total number of operations, bytes, and time that your volume's been doing traffic. We've also got the throttle metrics that we showed before for both volume and instance performance. But that's not the best part. The cool part about these detailed performance stats is that we now give you latency histograms. These latency histograms show you the number of IOs that land in these microsecond ranges of time. You can actually characterize what the EBS storage subsystem is doing for your volume.

Now these are after any performance throttles are applied, so any instance throttles or any volume throughput throttles. It really does give you that picture of how EBS is behaving. If you run this side by side with something like iostat or maybe you do your own trace probes in the kernel to build your own histograms or even off of a benchmark like FIO, you might see a difference between these and what that difference is telling you is a couple things. One, you may need different application tuning.

It may be showing you that you're throttling, so look at some of the other parts of these metrics. It could be volume throttling, instance throttling, or just application or kernel tuning that you need to do. It really gives you a view behind the curtain so that you have a demarcation point where you might find a network drop located. If you're using the EBS CSI driver like we're doing here, these metrics can also be populated right to a Grafana Prometheus endpoint. For our platform we already had these populating, so let's take a look.

We can see that the P99 latency is starting to creep up. If we look at the graphs from earlier, this log scale graph with a log scale x-axis, which is really unusual, you can see that we're going to see some outliers on gp3. So maybe one of the things that we should do is change it to an io2 volume. This is where Elastic Volumes comes into play. You're able to change your volume type and size and performance characteristics online, but there are a couple of caveats.

For size, the size is available immediately. So if you're running out of space and you're using Elastic Volumes, you'll get the size immediately. The volume will actually go through an optimizing state while we're re-accommodating for that extra storage space. You can use that space and we'll figure it out on the back end. You might have to resize your file system too.

If you want to add IOPS or throughput, as we go through that optimizing state, the performance will be somewhere in between the original and the new higher performance. As we optimize, different blocks get laid out a little bit differently and we might need to reshard them. The caveat here is with latency. I said you could just switch to an io2 volume, but you need to be careful because during that optimizing phase sometimes you will see a little bit of latency impact. On average, your writes are going to have less latency impact than your reads, but that's not always the case.

Understanding Latency and Concurrency: Little's Law and Queue Depth

You need to think carefully about whether switching to an io2 volume right on this live instance is the right thing, or should you fail over to a different instance and see if that one behaves a little bit better and gets rid of some of the outliers while you backfill the rest of your cluster with io2. Let's talk a little bit more about why latency matters. You couldn't get out of an EBS performance talk without talking about Little's Law. This is a common equation and a way to help us think about queuing in an average system. Whether it's a single process, a network device, a distributed system, or even a queue at the grocery store, Little's Law is the thing that can help us reason about the capacity of these systems.

It briefly states that the mean concurrency in the system, which is L, is equal to the mean rate at which requests arrive, or lambda, multiplied by the mean time each request spends in the system, W. It seems pretty simple and it's pretty straightforward. It gives us a pretty good way to reason about an ideal system and the long term concurrency of it.

Concurrency is a useful measure of capacity because it gives us a measure of consumption of resources, and those resources could be anything like CPU cycles, storage capabilities, or really anything that's measurably limited. Indirectly, we can also use concurrency as a measure of contention. So if the concurrency is high, it's likely contention is also high. How many of you flew into Las Vegas this past weekend? I'm guessing everybody did. I know I did.

So here's an image of air traffic control. This is a sample of a 4-hour window of an average typical morning of planes flying into the area. This includes both commercial and private flights, but there really aren't many private flights, so it's mostly commercial. On a typical day, you've got an orderly flow of traffic with airplanes converging on their final approach course about 10 to 15 miles out, and it's really easy to balance the system because airlines have a pretty predictable schedule. They arrange that schedule to not overwhelm the airports, and sometimes weather happens.

It's been said that 1 mile of highway takes you 1 mile, but 1 mile of runway can take you anywhere unless you're stuck getting to your destination because there actually isn't enough runway space. The other week I was at the Las Vegas Grand Prix. This is a 4-hour window around the same time of day, but just a couple of days before the Las Vegas Grand Prix. You can see that there were a lot more flights coming into the area. A few interesting things to note here is that you start to see traffic getting diverted. There are actually 3 main airports in the Las Vegas area: one where the commercial flights go and then 2 general aviation airports.

Those runways are still a fixed resource. They can't build more overnight, and they really don't want to because it would be pretty expensive. We can think of those runways as the ability to support concurrency. Only one plane can occupy a runway at a time. As the amount of traffic increased, and that's our lambda, the queue stretched out even further with many airplanes joining out 50, 50 and even beyond miles out as opposed to the original 10 miles.

The other thing that happened is to keep an airplane that takes off from the LA area from having to go all the way to Texas before it can come back in. The FAA implemented a reservation program. If I had planned to take my Bonanza down from Seattle, I would have had to get a reservation, plan my flight, and all the airports actually charge a whole lot to land on that weekend anyway, or I'd be forced to turn away. That kind of reservation system is an admission controller, a throttle.

You're thinking, okay, airplanes are fun, but what does it mean for my application? Some applications can adjust, but sometimes it's really just a function of your workload. This is where latency matters. If you have a single I/O in a queue and you don't put the next one in the queue until that first one completes, that's called a queue depth of 1. Latency will dictate the IOPS that you can achieve on that storage system.

Say you have 500 microseconds of latency. Each queue depth can achieve about 2,000 ops, assuming it's perfect. The lower the latency, the lower your queue depth, the faster we can return those transactions, and you can still get the performance that you need. If you're designing a storage application, you're going to want to think about how you can drive more traffic to your storage subsystem.

Key Takeaways and Additional Resources

Databases often have a way to tune the maximum queue depth, but the minimum queue depth is really just going to be a function of your load. On an io2 volume with 256,000 IOPS, you're going to need a queue depth of at least 128. A few notes to take away: plan and think about your application ahead of time. Test and use some tools. The higher up the stack you go, the more accurate it's going to be. Keep track of what's going on with your application and then react to those changes.

We've got a couple more EBS sessions this week at re:Invent. Storage 323 goes even deeper under the hood with EBS. It's a chalk talk with two principal engineers. It should be a fun one. I hope you have a great re:Invent.

; This article is entirely auto-generated using Amazon Bedrock.