Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Deep dive into Amazon Aurora and its innovations (DAT441)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Deep dive into Amazon Aurora and its innovations (DAT441)

In this video, Tim, a Senior Principal Engineer at Aurora, presents DAT441 covering Amazon Aurora's innovations celebrating its 10th anniversary. He explains Aurora's cloud-native architecture with storage across three availability zones, six copies of data, and automatic scaling up to 256 terabytes. Key announcements include Aurora PostgreSQL dynamic data masking using the PG column mask extension, Aurora Create with Express Configuration enabling cluster creation in seconds, and Aurora Internet Access Gateway for VPC-less access. He demonstrates performance improvements with Platform Version 3 achieving 30% better performance, I/O-Optimized configuration reducing costs and improving latency by 6.4x, and Optimized Reads with tiered cache on R8GD instances. Additional features covered include global database switchover improvements (30-second RTO), local and global write forwarding, zero-ETL integration with Redshift and SageMaker, blue-green deployments now supporting global databases, AWS Organizations Upgrade Rollout Policy for fleet management, and integration with agentic AI frameworks like LangChain. He concludes by comparing Aurora PostgreSQL's active-passive architecture with Aurora DSQL's active-active distributed approach.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction to DAT441: Aurora's 10th Anniversary and the Aurora Family

My name is Tim. I'm a Senior Principal Engineer at Aurora, and welcome to DAT441, where we're going to talk about Amazon Aurora and its innovations this year. We've given this talk every year for quite a few years now, but this year it's going to be a little bit different for two reasons. The first one is, if you've seen one of these talks before, I'm not the normal presenter—I'm not Grant. But also, secondly, Aurora turned 10 this year.

If you're not up to speed with that, it means we celebrated Aurora's 10th anniversary. Thank you, and congratulations to Aurora. That cake was delicious, even from the other side of the world. We did a live streaming event where you could watch people who have been working with us for those 10 years and see a bunch of cool demos. I really encourage you to check out that live stream if you'd like—you can still access it via the QR code.

But if you haven't seen this talk before, I want to orient ourselves a little bit. So what is Amazon Aurora? Aurora is a cloud-native relational database that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source. It's fully managed and fully compatible with MySQL and PostgreSQL, respectively. It has a whole bunch of tools that allow us to plug into serverless and machine learning applications. Because it's compatible, your applications don't need to change in order to use Aurora, and all your extensions work for PostgreSQL and other databases as well.

The most recent addition to the Aurora family has been a third engine called Aurora DSQL, which stands for Distributed SQL. This talk won't focus deeply on DSQL—we will talk about it a little bit towards the end. If you're interested in DSQL, I'll put some other talks up at the end that you should have a look at as well.

Aurora Architecture: Storage, Replication, and Automatic Failover

I'm an engineer, so this is a technical talk, and I need to get a little bit technical. We'll talk about the architecture of Aurora first because this underpins a whole bunch of what we're going to talk about today, and I think it's pretty cool. If you're not interested, too bad—you've got to listen to me geek out anyway. Here we have a common picture that you'll see a bunch of times today: three availability zones. At the bottom, we have Aurora storage, which is really what makes Aurora special.

You can see the first thing is that the big blue box covers all three availability zones, so we have three availability zones of storage out of the box with Aurora—no questions asked. We have a bunch of yellow boxes representing storage nodes, and there are normally a lot more than nine of those. We have your application with two application nodes running in two different availability zones. They're talking to one read-write node here, and they're talking directly to the storage.

What are we writing to that storage? We're writing log records—database log records. These are the instructions to change a piece of data that a database would normally write. The special part about this storage is that it understands how to turn those log records into database pages. That means there are no checkpoints like a traditional database, no full page writes, no double write buffers, and no log archival. All those kinds of problems that you'd normally hit with traditional databases don't happen here.

You can also see some of my color coding here. There are six copies of each piece of data written across six of those storage nodes. Every time you write once, we get six copies across three availability zones, and that comes out of the box too. We write those log records, but we read back database pages, and that's because the storage can turn those log records into pages.

What happens if one of those pieces of log record maybe gets broken or gets missed? Well, we'll just peer-to-peer repair it from another storage node in a different availability zone. What happens if a whole storage node goes away? Again, we'll just peer-to-peer repair that from another one. This is all happening under the covers in the database engine, and you don't even know that it's happening.

We can add another replica—here it's a read-only one. It's reading from the same storage, so there's no consistency lag between these two things—it's the same storage. We're doing asynchronous invalidation and update between those two head nodes. So when we run a transaction on the read-write node, the read-only node finds out about it within just a few milliseconds. We can add up to 15 of these replicas in different availability zones or the same availability zone if you want to.

Again, this is not disruptive. We're not having to reboot any of the cluster or any of the other replicas in order to do this. We can add or remove them as we like. These replicas can be different sizes and different types of instances. Here we have an R6G, which is a Graviton processor, we have an R7i, which is Intel, and we have a db.serverless, which we'll talk about some more later as well.

The storage expands up to 256 terabytes automatically. You're not provisioning this space, and you're not provisioning performance here. This is completely elastic and automatic. When something goes wrong—maybe we lose a whole availability zone of those head nodes—we're able to automatically promote one of those replicas to be the read-write node. Your application will check into Route 53, the DNS server there, we'll change the endpoint, and we'll start to talk down to the read-write node. So if you want even faster failover than that—normally that's about 30 seconds because you're waiting for the DNS—you can use the advanced rapid drivers for JDBC, ODBC, Node.js, and a whole bunch of other ones. They'll be able to change over within just a few seconds, so it's up to 66 percent faster. That's the whirlwind tour of Aurora architecture.

Global Database: Multi-Region Replication, Failover, and Switchover Improvements

I'll test you later to see if you've got it, right? One of the things that allows us to do is go multi-region. This is the same picture on the left, and on the right we have another region, called a global database. When we add another region, the most basic thing we can do here is asynchronously replicate the storage. If we look carefully, those orange arrows are coming through the replication server, going over to the replication agent, which are things that I manage, not what you manage. They're coming from the storage, not coming from your head node. Your head node's not involved, so your application doesn't know that global database is happening and doesn't have to make any changes. We replicate over, so when you write, those log records go up through the replication server and agent and into the storage for the other side.

Now we can make some replicas on the other side as well, in region B. They'll be read-only because this is a single writer system, and they'll be able to see that same data within whatever your RPO lag is there. That's typically less than a second, depending on your choice of the two regions. You can have up to 15 of these regions, or up to 10 of these regions, if you like. You can put your application running inside that other region too. This is a really cool pattern for doing low latency region local reads, so you don't have to worry about the cross-region latency.

With global database comes the question of how do I know which region I want to talk to and where is the writer. That's where the global endpoint comes in. This is a DNS name that points to the region that is the writer at this point in time. Right now it's region A. So what happens if region A has a problem? You're going to issue a failover operation. You're going to tell me, Tim, you need to fail over and make region B be the primary region, please. I'll promote one of those instances to be the writer node. You can see there's an allow data loss flag down there in the bottom right corner. That's reminding us that because this was asynchronous replication, there might be a little bit of data inside that asynchronous replication window that might be lost when we do a forced failover.

When we've done the failover and promoted that one to be the writer, we'll also update the DNS name for that global endpoint using the Route 53 data plane API. That's really reliable and really quick, much better than you could do with your own DNS. That was a failover operation, which is when something's broken. Global database switchover happens when something's not broken. We have two regions, they're both healthy, and we just want to move the writer, maybe from region A to region B in this case, perhaps because you want to follow the sun with your operations. We've recently announced some improvements here. We've gone down from about 5 minutes of RTO here to about 30 seconds, so I really want to explain to you how it works.

You can see down the bottom left corner, there are those colored squares, which are those log records, right, and they are in a time sequence. This one was written before this one, before this one. I want to do a switchover at this point in time, so I'm going to write a special one of these log records into that log stream, that's my little star there. That's going to flow through the normal replication machinery there. When it pops out in the second region, the second region's going to see that and interpret it as switch over now. We can use this log-based architecture of Aurora to get really fast switchovers. We can get RPO of zero, we're not losing any data, and an RTO of about 30 seconds. If you'd like to learn some more about this, there's DAT441 where you can watch the recording of because it happened yesterday.

Local and Global Write Forwarding with Consistency Modes

Alright, so local write forwarding. This picture here shows we've got two availability zones, we've got two application nodes and two replicas. What I'd really like to do is be able to write from my second availability zone, availability zone 2. But I've got a single writer system, so if I try and send that write to the read-only node, it's not going to work. So I can turn on local write forwarding. Now this does not make us an active-active system. What it does is when we send a write down from availability zone 2, we'll forward the write over to the other side. It'll execute the write exactly as it normally would, and then the results will be pulled back again.

Here's a bit of an experiment I did. One of the questions that comes up is, well how do I know about consistency, because this was asynchronous replication, right? So we have three different consistency modes, or visibility modes. The default one is session. My test here does an update inside a session, and then it selects the thing that I just updated, and then it selects the thing again that I just updated, right? In the session visibility mode, we can see I did the update. It took a few milliseconds because it had to go across to the other side. When I did the first select, I had to wait for it to come back again.

In the eventual consistency mode, this is like the no seatbelts mode, where we don't wait to read our own writes at all. We did the update, it took whatever time it took, and then when we did the select, that came back really quickly because we didn't wait for anything that was happening on the other side, right? Maybe we did not read our own writes in this case, maybe that's a bit confusing for your application, so you have to be careful. The other option, the extreme other end, is global consistency, right, where you always wait for everything that's happening in the whole cluster. You can see that the first update took quite a while because we had to wait for things to come back, and then the selects, they all took the same amount of time too.

Right forwarding is useful, but you need to watch out for your consistency model. We can extend that and talk about global write forwarding. It's a similar picture, but we have two regions now instead of two availability zones. We have the same problem: read-only nodes trying to write can't write. I want to turn on global write forwarding. The same thing happens—I'll forward the query over to the writer region, execute it, and pull it back again. We have the same consistency modes and the same considerations that you need to be aware of. Obviously, the latency numbers will be different because we're talking about cross-region.

Inside Aurora Storage: Log Processing, Coalescing, and I/O-Optimized Enhancements

That's global write forwarding. Again, it's available on most modern engine versions. Now I'm going to dig into the storage a little bit more to explain how it works. Here we're looking at an Aurora storage node and what's happening inside. We have our read-write node, and it runs a little storage daemon, which is basically the driver that lets our engine talk to our storage.

We're going to write this log record called A. It comes into the storage node. Remember, this is one storage node out of six, so this is actually happening six times. It comes into this incoming queue, which is in memory. We drain it from the incoming queue to the hot log, which goes to disk. Now we can acknowledge it because it's safe on the disk. Then we do it again. Here comes record C. It goes into the incoming queue, down to the hot log, acknowledge, and it's all good. Now, if your alphabet is any good like mine, you'll notice that B is missing in the middle there. Maybe it was just delayed in transit, so we can peer-to-peer fetch it from some other storage node. Here comes B. Now we have A, B, and C all in order with no gaps. We can move things over to the update queue and turn these log records into pages again. We call that coalescing. This is all happening really quickly inside the storage.

When we need to do a read request, we have it ready, so your read latency is not impacted. This storage node is also continuously dropping all of those log records and database pages into Amazon S3. This is continuous backup, so you can do point-in-time restore at any point in the last 35 days in your retention window.

Now, in I/O-Optimized, we'll talk about this more. This is a configuration option where we change some of this to increase throughput for I/O-heavy applications. What happens is we change that storage driver on the left-hand side, so it's batching more aggressively. It's really good for I/O-heavy applications. With new modern engine versions, we can turn this up to the next level. We changed that incoming queue to be a durable queue. Now when I try to write log record D, it comes in and gets durably queued directly in that box. You don't have to wait for it to drop down to the hot log, so we can acknowledge it directly. This means your latency goes down because I have to do less work, your jitter goes down, and performance improves. We'll see some results for this in a minute.

Aurora PostgreSQL Updates: Performance Gains and Dynamic Data Masking

Aurora PostgreSQL is fully PostgreSQL compatible, so we always take in PostgreSQL updates from upstream. I want to talk through some of the laundry list of what we have here, and this is a very long list with lots of things missing, so I apologize if the thing you're looking for is not here. We support up to PostgreSQL version 17.6, which we just did last week. We support the RAGD instance type with local disk. We'll talk more about that today. We have a bunch of performance improvements from upstream: correlated subquery cache, adaptive joins, and tons of other stuff. We have shared plan cache, which is when we have multiple PostgreSQL processors running. Each has to plan their queries, and they can look into shared cache, which saves a bunch of memory. We improved read availability for large instances and large replicas so they can boot up faster. We do FIPS 140-3 security encryption across the board now. Lots of extensions have been updated, with pgvector being the most interesting one for some people, and we support up to 256 terabytes of volume size. Like I said, there are a whole bunch of other things in there, but I just want to focus on a few of them today.

Let's look at Aurora PostgreSQL performance and throughput over different instance generations. This is a sysbench read-only test, basically a CPU test that's all in memory. The purple line shows an r5.24xlarge, which is the biggest instance we had 10 years ago at the birth of Aurora. The blue line shows an r7i.48xlarge, which is twice as big and two generations newer, still Intel. The Y-axis is queries per second, so higher is better. We can see that's a 2.1 times improvement in performance, which is CPU-bound performance, just by doing a couple of upgrades to your underlying instance. It's pretty straightforward.

Then if we add in the r8g, which is the Graviton version and the next generation, the current one now at 48xlarge, the same size, we could get a 2.7 times improvement against the baseline. That's better than linear scaling. If you did nothing else other than just follow the normal scaling rules that we were giving you, you get 2.7 times over 10 years, which is pretty impressive.

Our first new feature I want to talk about today is Aurora PostgreSQL dynamic data masking.

It is for organizations using Aurora that need to protect sensitive data. For example, you might have account numbers, account holder names, and other personally identifiable information. You also have account balances, and maybe you want to give that to an analyst, but they don't need the exact balance. They just need some rounded version of the balance. That's what dynamic data masking is for. You can mask things out and make them X's, or you can apply different functions to do some rounding. That's what dynamic data masking lets you do, but it lets you have the full expressive power of SQL, so you can still do joins, you can still use all your indexes, and all those things.

Let's see how it works. It's implemented in Aurora in this new PG column mask extension. We define a policy for masking, in this case called "fully masked account data" for my example. We apply this to the accounts table, and then this is a list of the columns that we're going to be masking. For example, customer ID one, we're going to mask and turn it into X's, and account balance one, we're going to apply this round function. That's how we want to do masking. We apply it to all of the roles in this list. This list has only one role called analyst, and it has a weight of 50. This weight is used when we have overlapping policies, so we know how to apply them.

This is pretty useful already, but I want to talk about how it's actually implemented, because I think that's interesting. We have this select query. How does this actually work? Well, inside PostgreSQL, we have this pipeline: parse, analyze, rewrite, and plan. Dynamic data masking plugs in at the query rewrite layer. We rewrite the query to fetch masked results. We don't mask the fetched results. That's how we can get by and have really good performance and make your indexes still work. That's pretty important. You can see the action of this in the describe output here. You can see the mask function being applied to that column.

Aurora MySQL: Recent Features and Improvements

Now for Aurora MySQL, we track upstream. Same story with the laundry list here. I apologize if your favorite feature's not here, the list is just too long. We released 3.11 just a couple of weeks ago. That gives you MySQL 8.0.43 compatibility. We also released a 3.10.0 long-term support version. That gives us an in-memory relay log cache, which is up to 40 percent improvement in log replication throughput if you're using binary log replication. That's pretty awesome. We have those advanced rapid drivers that we talked about, so now they support ODBC as well for MySQL. We have 256 terabyte volume support, and one that's pretty neat for me is global database secondary readers. When something happens in your global database, those readers can now stay up and serve you reads for longer. So it increases your read availability when you have a global issue.

Aurora Serverless: Elastic Scaling, Faster Responsiveness, and Platform Version 3

Now for Aurora Serverless. This is the instance type that I'd like most of you to consider for your workloads. It makes managing your fleets easier because you don't have to worry about instance sizes. This is what we used to call Aurora Serverless V2. V1 went end of life, so now we just call it Aurora Serverless. Naming is hard, I'm sorry, just bear with me with the names.

So what is Aurora Serverless? It's an elastic instance type. You can think about it as just an instance type like a 4X large, except you can plug in and plug out CPU and memory, and we do it for you automatically as the system needs. Every time we do this, there's no impact. It's not like we're rebooting your engine or anything. We're doing this every second, scaling up and down the memory, CPU, and network for this instance. We measure this in units of an ACU, Aurora Capacity Unit, which represents 2 gigabytes of RAM and the associated memory, CPU, and network that goes along with it.

Here on the right, we have a Lambda, and it's talking to a small Serverless instance. Then another Lambda comes along and it wants to talk to that instance, and it sends it tons of queries, so it's too big, so we need to scale it up a little bit. Then a big analytics job maybe comes along, and so we need to scale it up some more. We're scaling up and down, and we price this per second, so as soon as we scale back down again, you're no longer paying for the things that we scaled up.

Here's a concrete example. We start off with an example with 4 ACUs, Aurora Capacity Units. We are watching for changes in these triggers: CPU, memory, network, and I/O. We also have this bucket of credits off to the side. This is how the scaling mechanism works. This scaling bucket is being filled up periodically. As it fills up, it means that if you need to scale now, you have this many credits with which you can scale up to your max ACU limit that you've set. Now I do need to scale. I've triggered something, maybe I want to use some more memory. So I consume some of these credits out of the bucket. I consume 4, and then I consume some more.

So now my instance has scaled up to 12, which is really fast—it's done this in less than a second. But my scaling credits have reduced, and they'll refill again later. What we've done recently is enhance the responsiveness of this, so we've given you faster scaling. The triggers now respond in less than 1 second, our bucket size is bigger to begin with, and it refills faster too.

What I want to talk about is some performance examples with this. The other thing we've done with Aurora Serverless recently is introduce Platform Version 3. This gives you up to 30% improved performance. It's available for all new clusters, so all you have to do is restore from a backup or a clone or make a new cluster, and you'll get this. It's not even an option that you need to select anymore. This will give you up to 30% improved performance.

Let's look at both of these two things combined. First test here is just for faster scaling. This is Platform Version 2, the yellow one is fast scaling, and the red one is not fast scaling. This is a sysbench read test, so we're pushing pretty hard, going up to 210 or so ACUs. What we first notice is that the yellow curve is a lot steeper—we're scaling up a lot faster than the red curve is. Then because we scaled up faster, that peak performance at the top runs for longer, so we can get your workload done more quickly for 3.6 times longer. And then we scaled down at 40% faster too. If you integrate all of that, then towards the end, you'll notice that we used 9% fewer ACU hours, which is what you pay for, so you're paying 9% less for this particular test.

That's great, so that's Platform Version 2 with faster scaling. Now we turn on Platform Version 3, the new thing here. We stick with faster scaling. This is a similar test, just a little bit heavier, so we can push even harder. Look at the purple line there—that's Platform Version 3 with faster scaling, and the yellow one is Platform Version 2. The first thing we notice is the purple one runs for 20% less time because it's going faster, so it doesn't need to run for as long to get the same job done. And then at the end of it, because it's scaled down, we used 20% fewer ACU hours, so that's 20% less on your bill. By combining these two features—faster scaling and Platform Version 3—all you have to do is make a new cluster, and you'll get both of these with no extra effort.

Aurora Create with Express Configuration: Rapid Deployment for CI/CD and Agentic AI

Now let's talk about one of Aurora's newest innovations, which was announced only yesterday and pre-announced. Aurora Create with Express Configuration. How often do people in here create database clusters? My guess is that this hand-drawn diagram, which is of course completely accurate, represents this population of people in here. Over on the right-hand side, there are the enterprising mindset people who don't create database clusters very often, basically never. Then there are the people in the middle who create maybe every week or something. But then there's this population that wants to create things almost every second. This is the CI/CD kind of people, right? You want to do things really agile, really quickly. I want to address that orange part there. That's where Create with Express Configuration comes in. This was pre-announced yesterday and is coming soon for Aurora PostgreSQL.

What this does is allow us to create a database cluster in seconds. That lets you open your mind to now using it in those CI/CD pipelines. You can use it from your agentic AI applications—you can create a new database for every single interaction with that, if you like. We can see here a few things. These fields we can set will create in a few seconds. We've got a flexible configuration there. This is Serverless, and we start off with 16 ACUs if you like—this is an editable field. We can change the name. We can only select a couple of PostgreSQL versions at the moment, but you can see that right-hand column says "Modifiable post-creation." So it's yes for most of those things. Even if you don't like the defaults that are here, as soon as the cluster is created, you can change it if you like as well.

This one starts off with 16 ACUs, which is good for a lot of people. But if you want a provisioned system, that's fine—just bring up your Serverless one and switch it over. We can just fail over like we talked about before. Then it's secure by default, so you get encryption turned on and IAM turned on, all by default. And it supports almost all the Aurora features. This is a regular Aurora database cluster that we're making—it's not something special. So you could do Global Database, which I talked about before, zero-ETL, and all the other features we talked about before. Backups work the same. It's all the same.

One of the new bits that I want to talk about a little bit more is this no VPC thing down in the bottom there. That's pretty interesting. Without a VPC, you need a way to get to your database instance. You probably don't want to put your database instance on the internet, which is normally not good practice, right? So Aurora Internet Access Gateway is this new component we've built, which allows us to deal with this.

This is a highly available endpoint that gives us access into your Aurora cluster. It's PostgreSQL wire protocol compatible, and it's not a big heavyweight proxy, so it's not giving you tons of latency to worry about. That means you can access your database cluster from anywhere on your laptop without having to worry about VPNs and VPCs. It really reduces the friction for you and your developers to do whatever they need to do.

This also integrates with Amazon IAM and it integrates with AWS Shield, so we can talk about the features that it provides like fraud protection and other ways to safeguard your data. The Internet Access Gateway comes by default turned on when we do Express configuration creates. But I have to talk about some of the cool things we've done here. If you would like to see an agentic AI talk, there is another one down the bottom there if you want to listen to me tomorrow. There are two really interesting things that I want to talk about here for Aurora in the very last minute.

A critical component of agentic AI is that it's a loop, so you need memory for this thing to work. Well, I'm a database, and I figure that Aurora is pretty good at being memory. So how can we do it? We want to remember something like "its favorite color is blue." You can use any one of these open source frameworks. We've partnered with a bunch of people already like Vercel or LangChain. Here's an example of doing it with LangChain. It's only about three lines here. We've created a database cluster with that Express configuration we just talked about, which takes us a few seconds. Then we go in there and enable the pgvector extension just because we want to do some vector embeddings for our LLM, though you don't have to do that. Then we can turn on and tell LangChain to start its Docker container and just point it at that PostgreSQL endpoint. We're done. Now you're using Aurora PostgreSQL for agentic AI memory.

This is the LangChain example, but there are a bunch of blog posts and other things that I encourage you to go and read for the other frameworks of your choice like LangChain and all kinds of others. The other thing is MCP servers, which stands for Model Context Protocol servers. This is the glue that lets an agent find the tools that it needs to talk to and communicate with them. AWS open sourced a whole bunch of these MCP servers for all its databases a little while ago. What this allows us to do is run natural language queries that understand the schema of what's going on inside an Aurora database and turn your natural language queries into SQL queries. This is super useful if you're new to a database and you're just exploring trying to find your way around, or maybe if you're a power user, this will make you much more effective.

Patching and Upgrades: AWS Organizations Upgrade Rollout Policy

That's it for generative AI. Patching and upgrades are maybe a bit more close to a lot of people's hearts. Hopefully I've convinced you and you've already got tons of Aurora clusters going already, and now you're worrying about how to upgrade and patch them and keep them secure and up to date. Aurora has always given us a managed experience here. I have a console screenshot showing a pending maintenance action that tells us we need to apply some system update right now, and this is a managed flow where you would just click go and it would do it for you. What do we do differently now? Operating system patches now happen in a rolling fashion, so if you have multiple nodes in a cluster, we'll just roll through them, which increases your availability. That's great.

We have a maintenance window that you've always been able to set. This lets you set the day of the week in which you want to take maintenance on this particular instance. We give notifications so you know if it's happening or not. This pops up in AWS Health, and you can opt into this being automated or not. Some people opt out of it because they want to control it from their own Terraform or something, which is completely fine too. This is one database cluster, and I think it's pretty under control. Now when you've got a fleet of clusters, I've convinced you all, you love Aurora, you've got tons of them, maybe you've got some dev ones, some QA ones, and some prod ones, and you want to sequence the upgrades for them.

If you wanted to do this yourself today, maybe you want the dev ones to happen on Monday, and then you want the QA ones to happen a couple of days later so you have some bake time just in case something goes wrong. The prod ones you want to happen on Saturday so you can do it out of business hours. There's not very much time between Monday and Wednesday, so there's not very much reaction time. Also, I don't know what's happening inside your databases or which ones are your dev ones and your QA ones. So I might announce maintenance on Tuesday, and if you have a setup like this, then your QA ones will be upgraded before your dev ones, which is probably not what you want to happen. What we did was announce the AWS Organizations Upgrade Rollout Policy just last week to tackle exactly this problem. So here we have our same three environments, and what we've done is break them into three waves: first wave, second wave, and last wave.

This is really what you were trying to express to me anyway. Here's what's going to happen: first, we'll notify you with a maintenance notification just like we did before, and then we'll wait for the maintenance window for all the resources in that first wave to come along. Then we'll upgrade all of those instances in the first wave. After that, we'll wait for a much longer time than we would have before, so you've got some time to react . Then we'll wait for the maintenance window to come up for the resources in the second wave and upgrade them. After waiting some longer time, we'll do the same for the last wave as well .

We're not actually blocking if something goes wrong from one wave to the next, but we're giving you enough time so that you can opt out again. You can turn the tag off or turn off automatic minor version upgrades if you'd like to. We're upgrading these things in sequence, so it doesn't matter which day of the week I release an update—it's going to happen in sequence for you. So how do you get on board with this? You have to enable AWS Organizations if you haven't already, and you have to put your accounts into the organization. You have to do the normal things you would do, like enable automatic minor version upgrades and check a maintenance window that says what you want it to say.

Now the interesting part is that we have to set a policy. On the left, we have the console version, and on the right, we have the JSON version of the same thing. This is telling us that it's allowing us to associate a tag—you're probably already using tags on your resources. Here we're going to make a tag called env, which is not a special word; you can call it whatever you like. We're saying that every resource with a tag called env that has the value of prod, for example, I want to put it in the last wave. If there is a tag that I don't recognize or no tag at all, that's going to get the default, which is the second wave in this case. You don't have to do tags at all if that's not what you're interested in; you can just accept the defaults, and everything will be upgraded at once.

Once you've created that policy, we can attach it to either your entire organization at the root or to individual accounts in your organization, so you can be selective here. Then when an upgrade actually tries to roll around, you'll see this in the normal way. You'll see in your pending maintenance actions up in the top left that it now says this is going to happen in the second wave. You'll also see this in AWS Health with the same information—you'll get a health notification with the second wave in this case. I think that will really help, so give that a try. That's the AWS Organizations Upgrade Rollout Policy.

Blue-Green Deployments and Zero-ETL Integration for Major Upgrades and Data Pipelines

Now let's talk about blue-green deployments. All of these techniques I've just talked about are fantastic if you're doing compatible upgrades that we can roll through. But when a major version upgrade comes along from open source, these are generally incompatible by definition. So we can't do them in place without taking a bunch of downtime, and probably you don't want to do that. So what is blue-green deployment? Here we have our production environment up the top with a couple of Aurora nodes, some shared storage, and your application talking to it through an endpoint.

When we want to create a blue-green deployment, we're going to create a complete copy of your production environment in one click. We're going to make all the nodes be the same and sync up all the data using logical replication, keeping it in sync as long as you keep this blue-green thing running. Any new writes that are coming into the blue will be pushed down into the green already. Once that's done, then you can upgrade that green environment and test it with your test application. You can do whatever you like; you can spend an hour here or a week in this situation—it doesn't matter.

When you're happy that the upgrade has done what you want it to do, then you can do what's called a blue-green switchover through the CLI or the console, and we take care of renaming all of your AWS resources so that you're not left with a mess afterwards. We make sure that we drain the logical replication first so you're not going to lose any data. We're going to switch the endpoint around, and your production cluster just keeps on talking to that endpoint. Now it's talking to what used to be called the green environment. The old blue environment is still left behind there, so you have a look-back plan again if something goes wrong.

If the switchover times out for some reason—normally the switchover is less than a minute—we'll automatically undo it and go back to the original blue environment, so you don't have to deal with anything. That's Aurora blue-green. It's a great way to increase your availability when you're doing these major version upgrades. I'd much rather you do the version upgrades rather than stay on an old version. I don't think anybody wants that. It's useful not just for version upgrades either; it's useful for schema changes, static parameter changes, maintenance updates, and anything else that you think is too risky to do in place and you want some blue-green protection from. You can use a blue-green deployment.

Now the interesting thing that we just talked about with blue-green just last week is support for global database. So now I've got a two-region picture.

Remember it could be up to five to ten secondary regions, but we have a global database with physical replication. When I want to do a major version upgrade, I can now use Blue/Green to do that as well. The same idea applies: I make a green environment and keep it in sync with logical replication. Everything I just explained works the same way. We end up with an RPO of zero, meaning we are not losing any data, and the RTO is about one minute, so our switchover time is about one minute. The entire global cluster can do a major version upgrade and be managed that way. It is pretty new, and I encourage you to take a look.

Let me talk about another neat feature that is enabled by the way that Aurora disaggregates its storage. This is Aurora's zero-ETL, which stands for extract, transform, load. On the left, we have an Aurora PostgreSQL instance. On the right, we have Amazon Redshift, and we want to get our data from PostgreSQL into Redshift, for example, and we want to do this with really low latency. I do not want you to have to manage the ETL pipeline that goes along with it because that is hard work, it is brittle, it can be expensive, and it can be time consuming. So we can add a zero-ETL integration, which is a CLI that you can run. What that does is tell the system to make a pipeline between the Aurora cluster and the Redshift cluster. It is easy as that. We will have a five to ten second replication lag, so once you have written something into the PostgreSQL side, it will pop up in the Redshift side within five to ten seconds. That is a really good turnaround time for an ETL pipeline.

If you were to do this yourself, you would have to worry about data seeding, which means how do you start this thing off from empty. You would also have to worry about how to do maintenance of this pipeline. We handle all that stuff for you with zero-ETL. The picture gets a bit more interesting because we can do multiple Aurora instances, maybe even different ones with MySQL and PostgreSQL, into the same Redshift that you might already have in that same data lake. We can get even more of them. We can also go from Aurora MySQL into Amazon SageMaker as well, using the same technology. I think this is pretty cool, but I think it is even better is how it works inside. That is what I want to get into: how it works underneath.

I am talking about a MySQL one just because that is the one I like to talk about better. We talked before about how Aurora understands the physical transaction log of what is going on. But in order to do CDC streaming, which is change data capture streaming, we need to have a logical log, not a physical log. That is not the thing that Aurora traditionally knew how to deal with. So we have to capture that CDC log. We also have to be able to seed this data to begin with. It is no good having a change log if we have no basis to base the change on, so we have to seed it.

Firstly, we can start off at the storage layer and do parallel direct export. This is where all of those individual storage nodes export directly into the Redshift storage, in this case, to create a seed. Your head node is not involved in this, it happens really quickly, and it does not impact performance on your head node at all. Once we have done that seed, then we can use what we call enhanced binary logs. This is a binary log for MySQL or the write-ahead log for PostgreSQL. It still works the same way, but this is where we put knowledge of the format of that logical replication log into the storage so we understand how it works. Then the storage nodes understand the enhanced binary log, we spin up these CDC streaming servers. They read the CDC log from the storage, not from your head node. By turning this on in your head node, you are not hurting your performance there, which is what commonly happens with MySQL-based systems. That CDC streaming fleet is going to apply filters, it is going to throw away the tables that you do not want, it is going to do all the modifications, and it is going to push that data down into Redshift. There are no arrows between the MySQL head node and Redshift. It is all going through the storage, so you are not feeling any of that performance impact. I think that is pretty neat.

Aurora Storage Types: I/O-Optimized Pricing, Performance, and Optimized Reads with Tiered Cache

Let me talk about Aurora storage types for a minute. We talked about I/O-Optimized before, and I will go into a little bit more detail. Here is Aurora Standard, which is what you get if you do not choose anything. I had the same picture before, and we are going to talk about the life of a few IOs and how much it might cost. The application is going to do a select, so this is read-only. Inside your head node, you have got a cache. We are going to hit on that cache, so it is not going to cost you anything other than a few milliseconds of running the instance. No IO. Or it is going to miss the cache and it is going to go to storage and read a database page, which will be eight kilobytes or sixteen kilobytes depending on your database engine, and it is going to cost you a fraction of a cent. When we do a write, remember we are writing these log records. They can be varying sizes and they can be batched together in different ways. When we do a write up to four kilobytes, that will cost you the same fraction of a penny.

The complexity comes when we do different types of writes. We might batch them together in different ways, and that might not be batched into one IO. It might be batched into up to four IOs because they're four, eight, or sixteen kilobytes in size. This is an unpredictable pricing model, which sometimes works out pretty well, and sometimes we don't like the unpredictability. So we wanted to give you an option to say, how can I make this more predictable? This is where Aurora I/O-Optimized came in last year.

This is a pricing choice. You can choose it whenever you create a cluster or at any time afterwards. You can switch it online once a month with no problem. What this does is change the way both the performance of the system works and the way that it's built. It's a cluster-level configuration, not an instance one, because there are some storage-side changes too, and these storages are shared.

What I encourage you to do is look at the IO proportion of your Aurora bill if you're already using Aurora. If that IO proportion is more than twenty-five percent, look at I/O-Optimized. It will probably actually save you money. I'm not trying to sell you something that's not useful. I think it'll actually save you money and give you better performance and give you predictability. Even if your IO proportion is smaller than twenty-five percent, give it a look. You might like the performance too.

This is available in all modern engine versions from the last eighteen months or so, and it's also compatible with the newly announced database savings plans if you watched the keynote this morning. So how does this work in terms of money? On the left, we have Aurora Standard, which we talked about. This is the normal AWS method where you pay exactly for what you use. You're paying for compute, you're paying for storage in gigabyte-months, and you're paying for IO at some rate depending on which region you're in.

Then in the I/O-Optimized SKU, if you've chosen that, you'll pay for compute at some slight premium, you'll pay for storage at some premium, and you'll pay nothing for IOs—zero cost. You might look at those percentage numbers there and become a little bit worried, but I want you to remember back and say, if your proportion of IO is about twenty-five percent or more, this will probably actually save you money.

Let's look at this example again. It's very easy to explain because all the numbers are zeros. It doesn't matter if I cache hit or miss, doesn't matter how I batch up the writes, it's just zero. Very easy to understand, very predictable.

Now let's look at not just the pricing part of I/O-Optimized, but what it's going to do to performance. This is throughput. This is a HammerDB test on a sixteen X-large, not the biggest box, but it's a pretty big box and it's pushing pretty hard. The Y-axis is NOPM, which is the metric for this benchmark. Higher is better. We start off with an R5, which is that older instance type, and it's version twelve PostgreSQL. This is an extended support version, an old baseline. It looks okay.

Then we turn on an R6i sixteen X-large, version fourteen, not much difference in the post-crisis version there, and we turned on I/O-Optimized. We got a one point nine times improvement basically just by turning on I/O-Optimized and going up one generation. R6i is still not very new. We turn on an R7i, version sixteen post-crisis, another ten percent just by doing an upgrade. We go to version seventeen, the most recent one, same instance, we get another ten percent.

All you did was upgrade the PostgreSQL version that time, probably just blue-green to do it. So that's even better. That's good for throughput. Two point three times overall improvement in throughput there, just by turning on I/O-Optimized, probably reducing your bill if you're working the system this hard, and upgrading a few PostgreSQL versions.

Now let's look at the latency of a similar test. This is actually what we built I/O-Optimized for. The throughput ones were just bonus points. This is where the money is. This is suspension read-only. This is latency on the Y-axis, so lower is better. The blue line is the R5 twenty-four X-large, version sixteen PostgreSQL. We can see how it's getting saturated towards the right-hand side there. The latency's kicking up, and we don't like that.

We've got the pink line, which is an R8G forty-eight X-large, version seventeen with I/O-Optimized. You can see how that line's almost flat. That's really what we want from a latency graph. Over on the low end, we've got a three times improvement in latency. I would be very happy with that. But on the high end, we're even better. We've got a six point four times improvement in the latency.

That's the thread count along the bottom there, so the harder you push, the better this gets. I optimized for latency. Now if we turn this back into throughput, we have the same test again. We can look at it and say this turns into a five times improvement in throughput in those same two baselines, just by turning on I/O-Optimized and going to R8G.

Give I/O-Optimized a shot today, and if you don't like it, that's fine. You can just switch it straight back off again. It's all online. There's no failovers.

It's all online, there's no failovers or anything like that. So that was all about rights. Now we just want to talk about reads too, I didn't forget about that. So one part of the puzzle is temporary objects. When your PostgreSQL is doing something like a really big index rebuild or a really big sort, it might run out of memory and needs to build to disk. In a normal system that disk is an EBS disk, so we have some performance considerations there. So if you choose a D instance, like an R8GD, then you have a local NVMe instance storage device inside your database instance. Then when we spill to disk, we'll spill to that thing instead, up to 6 times the size of the memory we'll allocate inside there. That means obviously the latency is a lot lower there. We'll improve our performance dramatically for workloads that use those temporary objects.

Right, so if we turn on I/O Optimize though, we reduce that 6 times down to 2 times your memory. Still good, still works the same way, but it's only 2 times the memory. So what's the point? Why did I save all this space over on the left there? That's what I want to talk about. In that space we put this thing called tiered cache. So if you're on a regular Aurora PostgreSQL and you do a read, you're going to look in your shared buffers in memory first. If it's there, we'll read it back and we'll be done, no problem. If it's not in shared buffers, we'll go to storage, we'll read it back into the shared buffers, we'll give it to the engine, we're happy.

So when we turn on Optimized Reads with one of those D instances, in that space we're going to allocate 4 times the amount of memory that we've got on the disk, and we're going to use it like we use shared buffers. It's the cache now. And you're going to take a little bit of memory to handle some metadata there for us. So now when PostgreSQL needs to read something, we'll check that metadata and see if the thing I want is going to be in the disk cache or not. In this case, it's not. So I'm going to read it from the storage directly into your memory buffer, give it back to the engine like before. So the fact that tiered cache was there did not help or hinder us in this example. But now it's there in the shared buffers, that's good.

Eventually, the shared buffer's going to get full, we're going to have to evict that data out somewhere. So instead of just throwing it away, we're going to evict it down into the tiered cache. We're going to keep a little piece of metadata up there to remember that we've done this. So now when we go and try and read that data, we're going to check that metadata and see that it is there, so I can just fetch it back from tiered cache, and it'll be nice and fast, low latency. Now, if I do an update, I need to invalidate that copy that I have in the tiered cache. But I don't want to go and write to it all the time, that would be pretty bad for performance. So all I need to do is really just flip that metadata bit, so that I forget the fact that I have that data in the tiered cache. That's what we do. So now when I try and read it, it doesn't matter, we're not going to go and look there.

Recently we added the ability to change the size of the tiered cache versus temporary objects. So if you have a workload coming where I know I'm going to need to spill to disk quite a bit, I can increase the temp space. Then that workload's finished, I can decrease it back again and have a nice big tiered cache. So that's now elastic and dynamic. So R8GD instance, that's a new one there, and that's the same price as an R6GD, so that's exactly the same price. So you're going to get, I think it's a 165% improvement of performance that versus an R6G. So for the same price, I think you better go and check it out if you're using tiered cache, and if you're not, maybe give it a look.

So let's look at some performance examples here. This is a histogram, so we want to be over towards the left and up high as much as we can. It's a latency histogram. So now we've done a sysbench point select and we're doing a uniform distribution. This is a terrible test, it's really random, it's not like what a normal test, a normal system would do at all. And this all fits in memory, that's why we've got a really tight spike over on the left-hand side. Latency is good.

Now we do a much bigger test, 340 gigabytes, that does not fit inside a 4X large memory. And so we can see a spike over on the right-hand side where we're reading from Aurora storage. That's 4 times higher latency. So then we're going to increase the size of that test again, and we can see the same kind of pattern. We're reading from Aurora storage and the latency has gone up about 5.5 times versus the baseline. So now we turn on tiered cache. We look at that 340 gigabyte one, we look at that yellow spike. That's in that space there, the mid-range latency, a little bit worse than memory, much better than Aurora storage. That's where all your reads are coming from. Notice there's no spike at all in the Aurora storage site in the yellow. That's because it all fits in tiered cache. So only a 1.5% increase in latency versus the memory baseline.

So now we go to that really big test, the one that doesn't fit in memory at all, and we can still see there are some purple reads over on the right-hand side, but not very many. So overall there's only a 3 times deficit in latency versus the memory baseline for something that was 8 times too big to fit in memory.

This is a really neat approach. But that was a uniform random distribution, so now we're going to look at a Pareto random distribution, something more like what your application probably does, and you can see that it's even better. We have the same set of tests here with tiered cache turned on, and we can see that it's only a 1.4 times latency reduction for that massive test compared to the one that fits in memory. For a more reasonable distribution, it's really powerful.

To give you a real use case quickly, this is the PG vector benchmark, a bigger benchmark with really good recall, and we're comparing an R8GD to an R8G, so that's optimized reads versus not. We can see there's over a 3.5 times improvement in the number of queries per second that we can get. So then we can add in the R6GD and compare that, just to compare the R6GD versus the R8GD. Remember they're the same price, and that gives us over a 1.6x improvement in vectors as well.

Aurora DSQL: Active-Active Architecture and Multi-Region Synchronous Replication

Okay, so that's it for Aurora MySQL and Aurora PostgreSQL. Now we're going to talk about Aurora DSQL for a few minutes. It's not a deep dive like I talked about before. What I'm going to give you is the flavor of the differences between these two engines so you know how to think about them. On the left, Aurora PostgreSQL is fully Aurora and fully PostgreSQL compatible. On the right we have Aurora DSQL. What's common is that we have storage across three availability zones in both of them. A query comes in and it's going to write to a query processing component, a head node as I called it before, and it's going to write some log records to this storage. All looks the same so far.

Then we're going to add in a second query that needs to come through from a second application. In Aurora PostgreSQL like I talked about, there's only one writer, so it has to go to the same place. We're going to use implicit or explicit locking to deal with any kind of conflicts that are happening because of that. On the DSQL side, we'll spin up a new query processor. This is active-active, so it can do multiple writes here. We're going to use optimistic concurrency control instead. You're going to have a bit of a race condition going, and one of them is going to back off and have to fail and retry.

If I want to scale out, I can add replicas like I talked about before in Aurora PostgreSQL. Now each of those replicas has asynchronous replication between them, so you have that consistency question we talked about, and it has caching, so you have good performance. Then you might fall out of the cache, depending on the access patterns like we talked about for optimized reads. On the DSQL side, we do this differently. It has its own distributed block store for handling reads, and that's going to scale independently. That means you don't have any caches inside your head nodes, but you have the ability to push down what I call optimized reads there. You have to push down some query predicates so we don't have to be so chatty with the reads.

As we scale up some more, Aurora DSQL will keep on adding query processors, and Aurora PostgreSQL we can do the serverless thing like we talked about before, and that can scale all the way down to zero after five minutes of inactivity. In the same way that DSQL will scale all the way down to zero once we don't do anything. Clearly we can see here that this is an active-passive architecture on the left-hand side with Aurora PostgreSQL with explicit locking. Aurora DSQL on the right-hand side has an active-active architecture. This is all in one region.

Now we'll talk about reads and writes and look a little bit deeper. So we have our availability zones, three of them because we've got three AZs, and a read is going to come through, goes through the AZ endpoint. It's going to go to some query processor that we choose. The query processor says, I need to read from these three storage servers in order to read the data that this query is interested in. Off it goes, fetches the reads, we're done, no question.

Now when I want to do a write, this query is a write, so instead the query processor is going to be spooling up all of the writes that happen as part of this transaction. We're not talking to any of the other components until the transaction commits. And then when the transaction commits, I need to talk to these things called adjudicators. Remember that I could have multiple of these write queries going on at the same time. The adjudicator is the component that decides who wins and who loses. Now in this case, I've touched a bunch of stuff, so I have to talk to two adjudicators because these things are sharded. The adjudicator says, okay, you can commit this one, that's fine. So then we send those spooled writes to one of the journals. The journals are storing these logs across multiple AZs. Then the journal is going to push whatever updates it needs to the storage shards that are impacted. Not all of them, only the ones that are impacted by the things that we're changing, and that's important for scalability as well.

Now the global story, the multi-region story, just very quickly. At the top there, we have Aurora PostgreSQL, and at the bottom we're going to have DSQL. So we have some replicas, we have read-only on the far side. We have asynchronous replication, active-passive. We've talked about this a lot now. That means that when we commit, your commit latency is an in-region construct because the across-region latency is asynchronous. In Aurora DSQL, things work a little bit differently. When we commit, we're going to synchronously talk across to the other regions, so every time you commit, your commit latency is cross-region latency. That's the fundamental consideration to think about between Aurora DSQL and Aurora PostgreSQL.

If you want to learn some more about Aurora DSQL, I encourage you to stop listening to me and listen to some of these other talks. We've got lots of things to choose from there, including chalk talks, breakouts, or workshops. If you've missed it, there will be recordings as well. So on that note, I hope this gives you some feeling for how Aurora works internally and what we've been up to for the last year. Now of course, I'm only doing this because I talk to you and you tell me what you want, so I want you to talk to me again afterwards and tell me what you want for next year. Also tell me in the survey how we can do this better for next time. This is really how we're graded. Thank you very much for your time. See you around.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community