Kazuya

Posted on Dec 4, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Amazon S3 performance: Architecture, design, and optimization (STG335)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Amazon S3 performance: Architecture, design, and optimization (STG335)

In this video, Ian McGarry and Devabrat Kumar from Amazon S3 explain how to achieve massive throughput and low latency using S3's distributed architecture. Key strategies include parallelizing connections through multi-part uploads and range gets, leveraging multi-value DNS answers, and optimizing prefix structures to avoid time-based prefixes at the start. They emphasize using AWS Common Runtime (CRT) for automatic performance optimization. The session also covers S3 Express One Zone, which delivers single-digit millisecond access times and 200K TPS out of the box, scaling to 2 million requests per second. Use cases include ML training, interactive analytics, log streaming with append capability, and model loading for inference pipelines.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Achieving Massive Throughput and Low Latency with Amazon S3

We're going to talk about S3's architecture and design and how you can leverage that to drive massive throughput and get high performance out of S3. My name is Ian McGarry. I'm a director of software development for Amazon S3, and I'm joined by my colleague Devabrat Kumar, who is a principal product manager for S3.

Let's go through our agenda quickly. In this talk, we'll dive deep into how S3 works, its scale, and how you can leverage that to drive massive throughput and get low latency out of S3. We'll also cover Amazon S3 Express One Zone, which is our high-performance storage class. My colleague Dev will be covering that, when to use it, and what the key benefits are.

So the question is, why would you want to drive high throughput and get low latency out of S3? It's an object store. Well, many of our customers look at S3 and see the durability, availability, and cost, but most importantly for these types of workloads, they see the elasticity. It can scale up and down to high throughput as needed. Many customers are running data lakes, analytics workloads, or machine learning training on S3 and are scaling to hundreds of gigabytes per second of sustained throughput. Similarly, they're running interactive querying on logs and machine model loading on Express. We'll talk about those use cases and how to get the most out of S3 for them.

Let's dive right in. If you take anything away from this talk at all, it's that you should use Amazon S3 scale to your advantage. There is a lot of infrastructure, disks, servers, and networking that goes into making Amazon S3 work for everybody, and that's actually the key to unlocking your throughput as well. Go broad and go wide across the various fleets we have across the world.

Speaking of scale, I wanted to take you through some cool numbers that I thought would be helpful to give you a sense of the scale we operate at. Amazon S3 currently stores over 500 trillion objects, which equates to hundreds of exabytes of storage. We also serve over 200 million requests per second, and customers are running over a million data lakes on AWS. Scale is definitely the goal. Because we support such scale, we have a lot of infrastructure, and that will be the key to unlocking high throughput today.

S3 Architecture Fundamentals: Understanding the Three Core Components

S3's architecture is quite simple at a very high level. We think of S3 in three simple components in the context of serving get object and put object, which means retrieving data from S3 or persisting data into S3. There's our front end, which is the set of services that route your request to S3 and also includes the services that orchestrate the processing of your requests. By orchestration and processing, I mean running authorization, running authentication, generating events, generating logs for your requests, and all the things that go into actually serving the request. It's also responsible for orchestrating and generating metadata about your object and your request, looking that up in our index, which I'll talk about in a second, and then also persisting and retrieving that data from our disks.

The next component is our index. That is basically a very large distributed key-value store, and it's simply there to map object metadata to the object bytes. A simple example is taking the key name of the object and storing the location of the bytes on the disks so that we know where to go to get the data. It also stores things like creation date and so on. And then finally, there's our storage subsystem. This is responsible for managing all of those disks at scale and figuring out where to place data. These three components work together to fulfill a get object and a put object request, and that's how we think of them. We organize around those internally, and for example, I lead the front end teams who work on all the APIs for S3.

That's the architecture. How does that map to what you do? Let's take a very simple example. I have a 500-megabyte object. It could be a video, could be a document, could be anything, and I want to upload that to S3. I establish a connection between my client and S3, and I start uploading via the request. One key piece of information: when you establish a connection to S3, you can only have one active request on that connection at one time. This will become important in a moment.

So with a 500-megabyte object, I establish my connection and begin my request. There's no actual limit on the amount of data you can upload on a single connection from the S3 perspective, but generally given network client configuration and different constraints, we find most customers can achieve about 100 megabytes per second on any single connection.

So if you take a 500 megabyte object and a 100 megabytes per second connection, that means you can upload that object in about 5 seconds. How do you get that down to 1 second? You want to go faster. The way to do that is to parallelize, and that's what I mean about using S3's breadth. Parallelize many connections across S3 so you can achieve higher throughput. For persisting data, we'll talk about multi-part uploads, and for retrieval of data and parallelizing the retrieval of data, we'll talk about range gets. We're going to talk about spreading connections across many different IP addresses, and we'll talk a little bit about the tools for doing this in a second.

Parallelization Strategies: Multi-Part Uploads, Range Gets, and Multi-Value DNS

Let's go back to our example. Now you have a single object, which again can be a video, but you carve that up into 5 parts and have a single connection to upload those parts in parallel. Now you can see that with each connection getting 100 megabytes per second, you've carved it up into 500 megabyte chunks or parts. You can upload those all within a second, but there are more benefits to this. We'll talk about taking multi-part upload, which is the API to achieve the put object in parallel. There are a few key benefits to this. The first one I mentioned is that it improves throughput, but it also improves recovery time. If any individual connection fails, which could be due to your client failing, could be due to network trouble, or could be due to the server on the S3 side, which is physical infrastructure failing, then you only need to re-upload that individual part.

You can imagine in your initial connection, you get 250 or 400 megabytes the way through, and your connection times out for whatever reason, perhaps because you go through a bad router or a bad switch. You now have to go back and re-upload all that data again. Not with multi-part upload. Now, if you upload 4 out of those 5, you only need to re-upload the one that failed. These can be uploaded in any order as well. The beauty of multi-part upload is that you can even start an upload to S3 before you have all the data in memory. You can imagine a streaming use case where you have data coming in and you don't know what its size is yet. It's streaming video or it's coming from a stream of data, so you can start uploading in parts, and only when you have all those parts uploaded, then you can complete that multi-part upload and persist your data.

There are a lot of benefits, and you can also pause and resume object uploads as well. You can go in individual parts, pause for a period, resume, and then complete at the end. It's the same on reads, except we're not using multi-part upload this time; we're using ranged gets. So again, if you uploaded your 500 megabyte object and now you want to download it again because you plan to use it, you can now do ranged gets, which are 5 individual get requests across 5 individual connections to pull all that data down at the same time and then reconstitute it on the client. So instead of taking 5 seconds to download the whole 500 megabyte object, you can now download across 5 individual connections and get it down to 1 second.

How do you do range gets? Range gets are a little bit different because now you've actually stored all your parts up there, so you need to use the list parts API, which lists out all the individual parts of your object and maps them to ranges, and then you can use the range get API to download those in parallel. Again, the same benefits apply here. If any individual part fails to download due to network trouble or trouble with your client, you only need to download that individual part. This is the key recipe or ingredients to going massively parallel. I've talked about one object and 5 connections. This can scale very broadly, and we have customers who are doing hundreds of thousands of connections today in this way.

Let me take you a little bit under the hood. What I mentioned earlier was that you have 5 individual connections. The ideal state is that it's 5 individual connections to 5 different IP addresses of S3. S3 has many IP addresses that all get stored in our DNS zones and they get returned to customers. Ideally, you want to parallelize across many because if we have a problem with one server, you want to quickly be able to recover and try different servers. Similarly, if you're doing a multi-part upload or a ranged get and one of those servers fails, you want to be able to quickly retry. Rather than having all your connections to one single IP address because if that fails, then all your connections fail, you want to spread across. How do you do that? A couple of years ago, the teams that I lead launched multi-value answers DNS. You can see on the right-hand side a dig command. I've just digged for Ian's bucket at our S3 endpoint in US East 1. If you look at the answer section, which is the response to the DNS query that I made, we've got 8 IP addresses there.

On any DNS lookup, we will return up to 8 IP addresses to the client. Many clients can take advantage of this and actually use all 8 answers rather than just taking one from the top. They cache them and use them to parallelize, but also to retry against. For example, if they fail a connection to one IP, they can pick another one without having to re-resolve DNS. This actually reduces latency on connections and allows you to go in parallel. Many SDKs and clients resolve DNS and then build increasingly larger cache lists of all the IPs they can use. AWS SDK versions Java 2.X, C++, and Python Boto3 all take advantage of this and have for the last couple of years. A lot of this is actually baked in under the hood. You're getting it, but if you want, you can dig into the endpoint today and see for yourself.

AWS Common Runtime (CRT): Automating Performance Best Practices

I talked a lot about connection management in theory. A couple of years ago, a few of us got together in S3 and realized we have all these best practices we're talking about on a case-by-case basis with customers about actually using them and baking them into their clients. We thought about how to architect their HTTP/2 clients, and we asked ourselves how we could give this to everybody for free and get them to stay up to date so they don't have to think about this very often. It's good to know about it, but you don't want to be tinkering with clients every single day. With that, we launched the AWS Common Runtime, or CRT as we call it for short. It has all of our best practices baked in as code, and it has a ton of value. It handles asynchronous I/O. It has an HTTP/2 client that's optimized and baked in. It does authentication and authorization for you. It does things like automatically parallelize uploads and downloads, so all those range requests and multipart uploads I spoke about—it handles that for you. It will also try to scale to your performance needs, and I'll talk about that in a second.

The other thing it has is built-in retry logic. When I spoke about failures, it already has optimal retry logic baked in. So if you fail a request, it will retry using a cached IP as opposed to re-resolving DNS. You get a performance benefit from that. There's a lot within the CRT, but I just wanted to talk about one single configuration that I really like because of its simplicity, and that is target throughput. I've put up the SDK config and the CLI config example. The target throughput is basically you telling the CRT: I want to get this much throughput out of my client. Just to call out one thing, this throughput is in gigabits per second, not gigabytes, which we were talking about earlier. This is bits, so it's a different scaling factor. The reason it's in bits is that most of our EC2 instances have network interface cards rated in bits. So it's nice for you to think: I want to get 20 gigabits per second out of this S3 client on this instance, which has a 50 gig NIC, for example.

The default is usually set at 10 gigabits per second, which is pretty high, but you can configure this. What it will tell the CRT to do is look at the objects you're wanting to upload or download and, taking your target throughput, automatically figure out how many connections we should open to try and maximize to that throughput. I've tested it myself, and it gets pretty close to around 20 if you set it at 20 with the right workload. It's a really nice configuration to set. You say: I want to achieve this throughput and get this performance. Instantiate the client, configure it, and away you go. How do you actually use the CRT? Well, the CRT is embedded in multiple clients, as I mentioned. It's also the foundation of our open source file connector client, which is Mountpoint for Amazon S3. It is also available by default on Python 3 and the AWS CLI on TRN1, P4D, and P5 EC2 instances. The reason is those have very large CPUs and large network interfaces, so they benefit the most from these performance design patterns. For other instance types, you can go and enable it and configure it yourself. I'm putting a QR code here for getting started with the AWS CRT. Now you know about it. It's definitely the first place to go when you're trying to think about how to drive higher and higher throughput to S3.

Prefix Strategy and Partitioning: Scaling Beyond Initial TPS Limits

We talked a lot about parallelization of connections and going broad. A reasonable question you might ask yourself is: can I just take that and continue to scale indefinitely? The answer is generally yes, but it also depends on your prefix structure. I'll talk about what that means in a second. We talked about those three components: our front end, our index, and our storage. Our index is a subsystem that's responsible for mapping metadata to storage locations. So you can imagine when a request comes in for a get object, we are looking up that index saying: where are those bytes are located so I can retrieve them and return them to the client.

Our index works around a prefix structure, and we'll talk about what that means in a moment. Let me take you back to the architecture to provide some context.

What is an S3 prefix? It is any string of characters after the bucket name. You create your bucket name, for example "Ian's bucket," and then any of the keys after that or any of the text after that is your prefix structure. That's used to map into the object data itself.

Let me make this concrete. Here, for example, I have my organization, and I have two divisions or two departments within that organization. I have engineering and I have finance. My engineering organization likes to store their data between prod and test. In prod, they have their production software artifacts that they use to deploy, and in test they have their test software artifacts that they use to deploy. My finance team likes to think about their fiscal artifacts in years because they have deadlines towards the end of the year that they need to prepare for, so they organize based on year. That's what your prefix structure might look like in a bucket.

Prefixes in S3, because it's an object store, are not directories, but it's a very easy and natural way to think about them. It's like directories going down to your data, but it's very important that you don't just think about them as directories because that can lead to suboptimal structures and prefix strategies. Let me explain why this matters.

As I mentioned, it matters because there are limitations per prefix. When you create your bucket for the very first time, when I create Ian's bucket, I can achieve 3,500 puts or 5,500 gets against that bucket. I can do more if I want to, but I need to think about my prefix strategy underneath. If I create 10 prefixes in my bucket, I can now multiply that by 10. I can achieve 35,000 puts per second or 55,000 gets per second. We automatically scale based on your prefix structure as you drive more and more requests.

As you drive requests that exceed the 3,500 puts or the 5,500 gets, we automatically partition out your prefix structure so you can drive more and more transactions per second. Let me walk through an example. I have my reinvent bucket. Today I can do 5,500 gets per second and 3,500 puts per second. Let's talk about what the first split might look like for my prefixes. I have prefix one and I have prefix two.

You might think of this as sales and engineering or finance and engineering from before. On my first prefix, I can now drive 5,500 gets, and on my second prefix, I can drive 5,500 gets as well. When I'm driving a combined 11,000 requests per second, S3 will automatically split out these prefixes, seeing that there is what we call prefix heat on both, and then splitting out those prefixes. But we can take this a step further.

Very simply, to keep things simple on the screen, I now have A and B under prefix 1 and prefix 2, and now I can drive a combined 22,000 TPS. Again, as I drive more requests to it, S3 automatically partitions out these prefixes. So it's really important to think about how you're prefixing your data beforehand. You want natural splits in your prefixes between different workloads. This can lead to sharp edges if we're not careful, and we'll talk about that next.

Here I have now switched over to a different format. I have day within my prefix, and this is a very common pitfall for users of S3. Having time-based data or day-based data is very alluring. It's very easy to put the day up front because you're changing your data over time and you might have analytics workloads that want to run on a day-by-day basis. So it's easy to put day at the start of your prefix to help you think about your data, but that does lead to suboptimal outcomes.

Let me talk about it. We'll take our exact same sample before where we have our prefix 1 and prefix 2 and we have A and B, but day 1 is to the left and day 2, day 3, and day 4 will all be to the left. The problem with this is that once I've generated enough requests to my prefixes, which has resulted in the partitioning to achieve my 22,000 TPS, when I move on to the next day, that's all wasted. It's all gone because I have a day at the start of my prefix instead of later on.

So you want to make sure that day two or any dates are further down or to the right in your prefix structure so that when we are partitioning your prefixes to the left, you're able to reuse those partitions later. I've called it out here, but the partitions from day one are now unused.

We need to do the same splitting on day two while we see sustained load. Now while we're actually partitioning or breaking out these prefixes to drive more requests, you might see HTTP 503s from us. We're telling you to slow down while we're doing the work to break these out. But once we've done the work, then you will sustain those requests per second without seeing any HTTP 503s or slowdowns.

And so it's really important that rather than putting the date to the left, you're pushing it down so that the work we do to partition out your prefixes gets reused as the days go on. And so in this example, I've now pushed day one and day two to the right, so any work we've done on prefix one and prefix two to split them, and then A and B to split those gets carried over as the days move on. So this is a really common pitfall, and something to definitely take away from this.

But whenever you're starting with an S3 bucket just to test or mess around, it's very easy to just create any prefix structure that maps to what you have. But when you know you're going to do high throughput and high scale, take a second, step back, write it down and make sure you're thinking about how your prefix structure will look like and that you're having natural divides to ensure that you can drive the TPS you need to. Here I've got another QR code. It's a best practices reference for optimizing high request rate workloads, talks a lot about prefixes, and gives you some other examples. I think it's a great secondary dive into this. I've given you the high level of how this works in a very simple case, but if you want to go deeper on that, use the QR code.

S3 Express One Zone: Single-Digit Millisecond Performance for High-Speed Workloads

OK, now I will hand over to my colleague Dev to talk about S3 Express One Zone. Hello, everyone. My name is Devabrat Kumar, and I'm a product manager on the Amazon S3 team. I lead S3 Express One Zone, and I'm going to talk about its performance characteristics, some of its unique capabilities, top use cases customers are using it for, and some key architectural considerations when building with S3 Express One Zone. So let's dive right in.

S3 Express One Zone is the fastest object storage, delivering single-digit millisecond access times. That's really fast. It also offers up to two million requests per second per directory bucket. I'll talk about it a bit later in more detail, but one of the key things to note here and kind of extension of what Ian was talking about is you get the TPS capacity right when you create the bucket. You can scale it up to two million, but you get 200K right out of the box, and we'll look into why that's important and why it's super relevant for some of the bursty access use cases.

Then you also have some pretty cool differentiated capabilities, specifically in this storage class. For example, you can add data to an existing object while it is in the object storage. You can basically append data to an object while it is in S3 Express One Zone. Something pretty cool for an object storage, which before this capability had been immutable. You could not actually mutate an object in object storage. Also, you have order of one rename operation now available, which basically means that regardless of the size of the object, you can rename it in constant time. A brand new API we launched a few months back this year.

With these capabilities, customers achieve up to ten times faster access times on S3 Express One Zone compared to S3 Standard. So why does all of this matter? It matters because there are a whole bunch of use cases that need high performance. That's kind of intuitive at this point, actually. Let me do a quick show of hands. How many of you in this room have at least one of these problems? Like at least one use case here that you work with. OK, that's a good number. Hopefully all of you use S3 Express already and benefit from this talk.

Let's start with the first one: ML training. Over the past few years, a lot of customers are using S3 Express One Zone to drive very high levels of throughput and very high levels of data transfer speed. As we know for ML training, customers deploy very large numbers of high-performance GPUs, and they want to keep these GPUs busy, right? So your data is in your object storage, you want to get it as soon as possible. So you want to have the fastest data transfer speed that you can attain.

When you try to do that, you run very high levels of TPS. And your transfer speeds could be up to one terabyte per second. And because S3 Express One Zone is built for high performance, you benefit from its capability to scale to these levels.

You can keep your GPUs busy and continue doing meaningful work. This is super popular for ML training these days, and we see a lot of customers adopting it, with some even training foundational models with S3 Express One Zone.

The next use case is interactive query. This is a use case that has existed for as long as I can think of. Customers have data and want to make use of it. They want to run high-performance queries, and many of these queries are interactive where an end user is waiting for the answer to come back. For example, think of an observability analytics use case where you have a whole bunch of log files sitting in S3 and you want to get insights out of those log files. When an end user is running a query, they might burst to tens to hundreds of thousands of TPS because they may need to scan a lot of that data. Obviously, they require low-latency access because they don't want to wait too long. This is exactly when you would want to use S3 Express One Zone to benefit from its high TPS and low-latency access for these interactive querying scenarios.

Next is log and media streaming, which is a different use case. What happens with log and media streaming is that you constantly get new data. Let's say you have applications and internal services creating logs, so new data is coming in. Or let's say you are a broadcast company and you have a continuous feed coming in. You typically write to an existing file. Before we had the append capability, customers used to provision and manage their own storage on top of S3, and that's where they created these files. When you have these files, you have consumer applications. If you're a broadcast company, then you probably want to send it to your end users. If you are an observability company, you want your end users to be able to run analytics on these files. So you want to perform writes as well as reads.

With S3 Express One Zone, because it now has the append capability, you can essentially create these files in the object storage itself, so you don't need to maintain an additional layer on top of it. You can build your files, you can use the append capability, and you can run high TPS and low-latency queries on top and have the full workflow running right from object storage itself. This simplifies architectures, lowers cost, and obviously improves performance because of the unique performance characteristics that S3 Express One Zone offers.

Last but not least on this slide is model loading. Over the last one or two years, a lot of enterprise companies are now building inference pipelines. In an inference pipeline, let's say if you are an e-commerce company and you have a recommendation engine, you may need to update your models because your inventory has changed or your customer usage pattern has changed. Whenever a model gets updated, customers want their inference nodes to pick up the updates as soon as possible.

In a typical inference pipeline, you would have tens of thousands of nodes. We have also seen hundreds of thousands of nodes or even more trying to read a model and weights, which is just a few files, pretty much at the same time. So it's a very bursty access problem. With S3 general-purpose buckets, as Ian was explaining, the TPS capacity scales gradually. You can scale to very high levels, but the scale-up is gradual. If you suddenly have a burst of 100,000 GETs coming in, let's say on a new bucket, then you may get slowdown or error messages. You can retry, but that may slow down the workflow, and that is something that customers are very sensitive to, specifically for these real-world inference pipeline use cases.

This is again where S3 Express One Zone's high TPS out of the box really comes in handy for customers, and we have customers building a bunch of model loading and inference pipelines on S3 Express today. Before we go on to the next slide, I think by this point we understand that S3 Express One Zone is the fastest object storage and offers 10x faster access times compared to S3 Standard. So it's actually a really good option to build your cache on, right? You can keep your data in the general-purpose bucket in your regional storage classes, you get multi-AZ resilience, and have a caching layer that runs out of S3 Express One Zone.

That's exactly why we have a number of customers today using S3 Express One effectively as a cache on top of general-purpose buckets and regional storage classes. One of these customers we are super excited about is Tavili.

Tavili is an AI platform company that serves as the web access layer for agentic workloads and large language models. Previously, they managed and built their own cache, provisioning storage to deliver low latency access to these agentic workflows. However, this approach was harder to scale, and their management overheads were significant. So they started looking into S3 Express One Zone, and today they're running it in production with their caching layer based entirely on S3 Express One Zone.

S3 Express One Zone scales elastically and delivers single-digit millisecond access times. The total cost of ownership for Tavili has gone down by up to six times. This is just an example of how elastic scaling and high performance out of the box can help customers save costs in addition to improving performance.

Architectural Considerations: Directory Buckets, Zonal Storage, and Session-Based Authentication

At this point, I hope you appreciate the differentiated performance of S3 Express One Zone and the popular use cases customers are using it for. Now let's go into some of the architectural considerations that you may want to keep in mind when building with S3 Express One Zone for one of these use cases or any other use case you may find it useful for. There are basically three considerations. The first is its single availability zone nature—it's a zonal storage class. The second one is directory buckets, which is a new construct we announced alongside S3 Express One Zone. The third one is its unique authentication mechanism, which we will talk about and which is optimized for low latency access.

S3 Express One Zone is a single availability zone storage class, which means that when you create objects, they are stored in a directory bucket in a single availability zone. This allows faster access, but that doesn't mean you cannot do cross-AZ access. Once you create your directory bucket and add your objects there, you can access them from a different availability zone. One of the remarkable characteristics of S3 Express One Zone is that the access cost remains the same. Regardless of whether you're accessing your directory bucket from the same availability zone where it is located or from compute in a different availability zone, there is no additional network cost. This is something to be aware of if your fleet is really large and distributed across different availability zones, which is pretty common for use cases like machine learning training and large analytics workloads.

Now let's talk about the second architectural consideration, which I would say is a design concept. The S3 directory bucket is a relatively new type of bucket that we launched two years ago. It exists alongside our general purpose bucket and has a pretty different scaling model as far as transactions per second goes.

With general purpose buckets, as Ian was explaining, it scales TPS capacity based on load. If your TPS load increases on the bucket, then it automatically scales to higher TPS capacity. With directory buckets, as soon as you create a directory bucket, it is already scaled up to 200,000 GET requests per second and 100,000 PUT requests per second. If you have bursty access use cases where it's hard to predict—for example, you have a whole bunch of end customers sending observability queries and you don't know which query is going to be super bursty, or you have a model loading pipeline like I was talking about earlier—then you want your bucket to be already scaled up. That's exactly what directory buckets are. They're already scaled up to 200,000 GETs per second by default and 100,000 PUTs per second by default. You can also request further scaling up to two million GET requests per second. So different scaling models of TPS are relevant for different use cases, depending on the nature of your application.

As the name suggests, directory buckets store their data in directories. The namespace is basically hierarchical. The implication of this is that if you try to run a list operation against your directory bucket and you're trying to list the entire bucket, then your list results are not going to be lexicographically sorted. This is in contrast to general purpose buckets, where your list results are lexicographically sorted.

Why does it matter? We always want to make it easy for our customers to reuse existing applications and existing code. If you are reusing your code, that works against a general purpose bucket, so this is a consideration you want to be aware of. If your application doesn't make any assumptions about the list order sorting, then it's totally fine. But if it does, then this is something you may want to be aware of.

Next, let's talk about authentication. When we launched S3 Express One Zone, we also launched a new authentication mechanism: session-based auth. With session-based auth, you have an API called the create session API that you use to create a session and get temporary credentials that you can then use in subsequent requests for authentication. What this does is amortize or distribute the cost of your authentication across multiple requests, which means that each of your requests actually finishes faster. The reason we did this is to improve latency performance for every single request that you make.

If you architect against REST APIs directly, you would need to use this API to manage sessions and tokens. However, if you use one of our SDKs, then this is done for you. The session management and token management is taken care of by the SDK, and you don't have to worry about it. This is something to be aware of, and we strongly recommend using our SDK and CRT and other tools that I'll touch upon again.

To summarize what we have discussed so far, if you want to use S3 Express One Zone and you have one of the use cases I talked about earlier, you create a directory bucket first, scaled out to 200K TPS by default. You want to have your compute in the same availability zone to optimize for latency. You can keep it in different availability zones as well if your fleet is spread out, but to get the best latency performance, you want to co-locate your compute with storage.

You want to use session-based auth to access your objects. If you're using our SDK, then session management and token management is taken care of for you. With this architecture, you achieve high TPS and low latency access right out of the box. You increase the speed of data access by 10x and benefit from S3 Express One Zone's performance for request-intensive applications like machine learning training and latency-sensitive applications like interactive analytics.

Choosing the Right Storage Class: Matching Requirements to S3 Performance Options

To put it all together, considering what we've discussed and taking a step back, when you are thinking about optimizing performance for your use case, you want to think about your requirements. You want to think about what the latency requirement is for your end user or your application. You want to think about what your access pattern is, whether the requests are bursty or whether the requests gradually increase over time, and you want to think about the kind of throughput that your application and your workload would drive.

If your access pattern is bursty and you have end users waiting and your application is latency-sensitive, then we recommend using S3 Express One Zone. On the other hand, if your TPS load increases gradually over time or you have predictable access patterns that you can partition your prefixes as Ian was describing, then you want to use general purpose buckets and you can use any of our regional storage classes like S3 Standard or S3 Intelligent-Tiering.

Regardless, we strongly recommend using a CRT-based client or Amazon Common Runtime library-based client because it implements a lot of performance best practices like use of multipart uploads and range gets for you. You can benefit from CRT if you are using our SDKs and opt into it, or if you're using Mountpoint for Amazon S3, which is our file client for object storage.

Here is a best practices reference, essentially a blog which contextualizes everything that we talked about in the context of a really popular use case these days, which is checkpoint storage. If you are in the machine learning space, I strongly recommend reading it. Even if you are not, I recommend reading it so that you get a sense of everything that we talked about in the context of a real world problem. On that note, we really appreciate you joining us today and hopefully you found the content useful. We request you to share feedback on the app and enjoy the rest of re:Invent. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.