Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025 - What's new with AWS File Storage (STG203)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - What's new with AWS File Storage (STG203)

In this video, Christian Smith and Andy Crudge from AWS discuss file storage innovations announced at re:Invent. They cover AWS's five file storage services (FSx for Windows, OpenZFS, NetApp ONTAP, Lustre, and EFS) and recent enhancements across four key areas: price-performance optimization including EFS scale-out performance reaching 2.5M IOPS and Intelligent-Tiering for FSx Lustre; expanded data capabilities through S3 access points enabling FSx integration with Bedrock and analytics services; resilience improvements like File Server Resource Manager for Windows, anti-ransomware protection for ONTAP, and direct-to-vault backups in AWS Backup; and simplified migrations featuring FSx for ONTAP support in Amazon Elastic VMware Service and FlexCache write-back caching for global distributed workloads.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: The Growing Importance of File Storage at AWS

All right, good afternoon, everybody. Welcome to Thursday. Thank you for joining us today. We're going to talk about what's new with file storage. It's Thursday, and our voices are kind of holding up so far, so if you see us exit the stage quickly, it's probably to get a drink so that we can continue talking here. My name is Christian Smith. I lead the worldwide storage specialist team for AWS. My name's Andy Crudge. I lead the FSX service team at AWS.

We're really happy you're all with us. We're going to talk about a bunch of things that happened this year in terms of things we released and things we announced this week. So this is a little bit of a retrospective and then a look at what we're excited to announce this year at re:Invent, and then obviously give you some insights as to how we think about the investments that we're making in file storage.

So let's talk about why file storage for a minute. File storage overall is the fastest growing storage service within AWS, and it's really because of the use cases that customers are bringing to AWS. We look at these in three different chunks. There is the chunk which is all those applications that run the business. They could be group file shares, they could be your enterprise applications like your Oracle or your SQL Server databases. There's a lot of value to running those in a simple manner across file services, and of course there's a whole backup and disaster recovery aspect, like how do we do business continuity in AWS. These tend to be more lift and shift applications that you're bringing over to AWS.

In the last 10 to 15 years, we've seen this emergence of machine-generated data. It's everything from stock tick data that's now getting into the millisecond kind of granularity. It's genomics data, it's cryo-EM data. I'm sure Nvidia's been in the news quite a bit lately, and you've seen all the talk about all the chips that they're making. Guess what? That's a lot of file storage behind those chips that they're designing in order to get to the final output of that chip. So we're seeing this growth of all this data, and the one common thing is this is a lot of shared storage that needs to be accessed by a lot of compute.

And then of course, with all these chips that are coming to market with Nvidia, there's things like machine learning. Machine learning of course is very high-speed, high-throughput data that needs to be processed quickly to keep those GPUs and CPUs fed, but it's also more than that. It's about people sitting and working in files and folders, people like quant analytics, people like genomics researchers, it's data scientists and how do they curate data and do it in a way that allows them to derive various insights. And then of course it's the ISV applications, so think about various vendors that are creating a managed service around their applications, customers like Salesforce that are using file-based data to serve the needs of their customers. So we're seeing this tremendous growth, and of course we have a rich portfolio to support that, and Andy's going to talk about that.

AWS File Storage Portfolio: Five Services for Diverse Workloads

Cool, thanks Christian. So yes, Christian mentioned there's a really broad range of workloads out there that use file storage, and it's because of this broad spectrum of workloads that we offer a rich portfolio of file storage services, each of which is built to serve different types of workloads that are optimized for different things. When we work with end users and application owners on a particular deployment, often what we see is maybe one of these five file storage solutions is the ideal fit for their particular application. But it's really interesting when we work at the organizational level, especially with some of the large enterprise organizations that we work with. Very often these organizations are actually using multiple of these file solutions within their environment, again, different file solutions for different types of workloads. But you know, it's really no one size fits all. The different services are really optimized for different types of use cases. So I'm going to walk through really quickly the five services we have and some of the use cases that they're really optimized for.

The first three services I want to talk about are within our FSX family of services. This is FSX for Windows File Server, FSX for OpenZFS, and FSX for NetApp ONTAP. With these three services, really what we're seeing, and the reason we launched these services, is we've talked to a number of customers over the years who have a bunch of enterprise file data, a bunch of file-based workloads, and the underlying storage they've used for these workloads have been traditional NAS appliances on-premises powered by popular file system technologies like Windows Server, ZFS, and NetApp ONTAP. And with these three services, really what we're looking to do is to make it easy for customers to be able to migrate these file-based data sets and file-based applications to AWS.

By giving you a like-for-like storage solution that will work the same way as the existing NetApp system or Windows system that you have on-premises. It gives you the same capabilities, the feature set, and APIs that you use to manage data and really makes that migration process super simple so that you don't need to worry about reinventing how you're managing your data or retraining personnel on how to manage storage or even changing your data management processes as part of coming to the cloud.

The next service I want to talk about is Amazon FSx for Lustre. So for those of you who are not familiar with Lustre, it's the world's most popular high-performance file system. It's really optimized for speed, for high levels of throughput, high levels of aggregate IOPS. We launched this service really to help customers build or bring and run high-performance compute-intensive, GPU-intensive workloads to AWS. For example, these are workloads like genomics analysis, like seismic processing, where you have really large volumes of data and you need to process them really quickly.

Especially over the last few years, a big use case we've been seeing with FSx for Lustre as well has been machine learning-based workloads, machine learning training, and machine learning research, which we'll talk a little bit more about later. Again, these are all workloads that need high levels of aggregate throughput and aggregate IOPS so that your CPUs and your GPUs are really maximized versus waiting on IO.

And then lastly, we have Amazon EFS. This is our fully elastic file system offering that we have. It offers fully elastic storage and fully elastic performance. It's really designed and optimized for simplicity and for born-in-the-cloud workloads and for builders who are really just looking to have a super simple shared storage solution that you can use with containerized workloads in ECS and EKS, with serverless workloads and AWS Lambda. EFS is integrated with all of these products. Again, this service doesn't offer the same broad set of NAS data management features like some of our FSx offerings or the performance scalability that Lustre does, but it's really optimized for simplicity for a lot of these born-in-the-cloud types of workloads that we're seeing.

Optimizing Price Performance: Amazon EFS Enhancements

So let's get into the meat of why we're here today, which is to talk about what's new. We're going to be handing this off today, so it'll be like these as we go along. Let's talk about why we're here today. We had a number of releases this year all throughout the year, and we have been slowly increasing or have been increasing the pace of innovation, the number of things that we're doing to enhance the file portfolio. So this is just a little bit of what we're going to talk through, some of these things as a retrospective, and then we'll get to the launches that we announced this week that I'm sure some of you may have seen. But I'm really excited to kind of dive deep and get going through these.

So based on your feedback and your requests, we release things that solve customer problems or enable you to do new things, and we look at this in kind of four different buckets of how we're going to organize today's talk. One is how do we optimize price performance, which is we keep hearing I need it to go faster and I need it to be cheaper, right? And so there's a TCO discussion that we like to have of how does hot data stay hot and cold data get optimized without intervention. We talk about doing more with data. You've got a rich ecosystem in AWS of services that exist out there, and one of the things we want to try to do is make it easy to consume those services for where your data is today without having to copy it around to multiple services.

Of course, there's the resilience aspect, which is how do I make my infrastructure more resilient to failures, faults, and threat actors. And then lastly, how do we make it easier for you to get your applications to AWS. So let's start out with the first one. Let's talk about optimizing price performance. We continue to see customers storing more data in their file systems and really looking for ways. What we're trying to do is make sure that your hot data has the performance it needs while your cold data gets cost optimized and you don't have to take any action to make that happen.

And so you'll look at a lot of the stuff we have, which always has a hot tier and then a cold tier, and we manage the interaction between those two things based on access patterns so that you don't have to think about doing that. But let's talk about price performance and what happened in this past year. So Amazon EFS, Elastic File System, I should say, had two big announcements last year, and I don't think we talked about these in kind of the right way last year. So let's talk about these two things. The first thing is we significantly increased read throughput, going from 30 gigabits per second to 60 gigabits per second, and we also increased write throughput from 3 gigabits per second to 5 gigabits per second. So if you're just doing a lot of high IO, a lot of streaming stuff, we've increased the bandwidth as your file systems continue to grow.

The second enhancement is a substantial increase in IOPS. We've achieved a 10x increase in IOPS, going from 250,000 IOPS to 2.5 million IOPS for read throughput and from 30,000 IOPS to 50,000 IOPS in write throughput. We call this feature scale-out performance, and what we didn't talk about is how you get there. Yes, this is all elastic, and yes, you get this performance, but when you hit these limits of 250 or 30 gigabits per second through your CloudWatch dashboards, if we're not contacting you to say we need to bump this up to a scale-out file system, please feel free to reach out to us because we do some things behind the scenes to make sure that this scales up to those performance levels.

The second thing we didn't talk about is that we also increased per-client throughput up to 1.5 gigabytes per second, and what we see is a lot of customers aren't using the EFS Utils package. If you want to get that per-client throughput, please make sure your systems have that installed on it because it helps optimize and tune to get that throughput. As these file systems keep getting bigger and bigger, one of the requests that comes in is that customers like managing fewer file systems. Managing less is better than managing more, and what customers have asked us is that they want to create bigger multi-tenanted infrastructure.

Intelligent-Tiering Storage Class: Elasticity Meets Cost Optimization

In February of this year, we released access points and bumped up by 10x access points from 1,000 to 10,000 access points per file system. Now you can feel confident that as your application scales, you've got both the performance and the tenancy to be able to segment these file systems in a way for security and compliance. Another launch, another capability we've launched within the past year that I want to talk about is the new Intelligent-Tiering storage class that we added to Amazon FSx.

About a year ago, last re:Invent actually, that was our first launch of Intelligent-Tiering, and we added Intelligent-Tiering to our FSx for OpenZFS service. As a quick recap of what Intelligent-Tiering is, it's a new storage class that's really built from the ground up. We actually took a step back and talked to customers and thought through what it is that customers have been asking us for and what they want to see in a new storage class.

First thing is Intelligent-Tiering, unlike previous FSx storage classes, is fully elastic. What this means is you create a file system and you don't even need to provision a level of storage. You're billed based solely on how much data you store. As you add data, you're billed more. As you remove data, you're billed less. It's fully elastic with no provisioning required.

The second thing is the storage class is really cost optimized. As Christian was talking about earlier, it's kind of a mix of warmer and colder data where within an Intelligent-Tiering file system there are actually three storage tiers, and your data is automatically tiered between these based on your access patterns, based on how cold your data is. Pricing for Intelligent-Tiering ranges from about 2.3 cents for the frequent access tier down to less than 0.5 cents for the archive tier. It's the same storage pricing as on S3 Intelligent-Tiering if you're familiar with S3 IT, and again, our goal here is really to allow you to get the lowest possible TCO for your data regardless of how hot versus cold that data is.

The last thing is we offer an SSD-based read cache on an Intelligent-Tiering file system. This is optional. When you create a file system, you could choose not to provision a cache if you have, for example, files where you're not particularly latency sensitive and you're just streaming large blocks. But for workloads where you have a lot of small files or a portion of your data that's really hot, you have the option to provision an SSD read cache and get super fast performance, sub-millisecond latencies for the hottest portion of your data.

As I mentioned, we launched Intelligent-Tiering a year ago on our FSx for OpenZFS service, and since then we've had a lot of customer discussions with customers who are really excited about this launch. One of the things that we've heard pretty consistently is customers with high-performance workloads similarly have been asking for the same simplicity of use, the same ease of use, the same price points that we have on Intelligent-Tiering. We talked to a number of customers who have traditional HPC workloads they've run on-premises. They have large file systems built on primarily HDD storage who need, who want low price points in the range of one to two cents and who really want the elasticity and the agility that you get with Intelligent-Tiering but need the performance of a scale-out really high-performance file system.

You could probably tell where I'm going with this. In May of this year, a few months back, we added Intelligent-Tiering as a storage class to our high-performance FSx for Lustre file system as well. This is the second file system in the FSx family that offers Intelligent-Tiering.

With this, customers with high-performance workloads can get all the elasticity, the agility, and the low-cost benefits of intelligent tiering along with the high performance that Lustre provides. This is a storage class that's going to cost you in the range of one to two cents for storage, where you can drive terabytes per second of throughput, millions of IOPS, and with that SSD read cache, as I mentioned earlier, you can still get sub-millisecond latencies for your hottest and most active data.

We've talked to a number of customers with high-performance workloads they're looking to migrate from on-premises, as well as machine learning workloads where, if you have large files and you're streaming large volumes of data, this is a really great storage class to optimize storage costs and also improve simplicity of how you're managing your data. As we look over the past few years of how we've been investing in FSX for Lustre, and this has all been based on feedback we've heard from customers and where customers want to see us going, a pretty continuous focus for us has been around improving performance scalability.

A year and a half ago, we launched a 15x increase in metadata IOPS, so this really helps for small file workloads and allowing you to scale to higher levels of performance. And then last re:Invent, we added support for Elastic Fabric Adapter and GPUDirect Storage. What this does is it allows you to drive up to 12x higher per-client throughput to a single file system than you could before, up to 1.2 terabits per second of throughput.

FSx for Lustre: Powering Machine Learning Research with Performance Improvements

Performance scalability is going to be a continuous area of investment for us. Customers tell us their workloads are continuing to grow in scale and throughput needs, and we're going to be continuing to invest in pushing Lustre to continue to deliver even higher levels of performance over time. But I do want to spend a minute talking about a really interesting use case we're increasingly seeing customers run on top of FSX for Lustre, which is machine learning research.

For those of you who tuned into Matt Garman's keynote this week, there's actually a few references to customers running machine learning workloads using FSX for Lustre on AWS today. As we look at how our customers are choosing storage solutions for these workloads and what's leading customers to look at Lustre, one of the common patterns that we see is that customers, especially for machine learning research, are looking for a combination of a couple things.

One of them is high throughput. When you're training and if you're reading a bunch of data from GPU instances, you want to maximize how much throughput you can get to those GPUs to maximize goodput and optimize your costs. At the same time, a lot of the machine learning researchers who are actually doing the research hands-on, what we've heard from customers is they want a really simple, intuitive, easy-to-use interface to their data, and many of these researchers are used to working with data in a native file system.

The environments we commonly see customers set up is they'll have researchers who SSH into hosts. They have their home directories, they have their Python scripts, they have their data on the same file system, and they're constantly iterating on trying out new ways of processing their data, new ways of training on the data. So this research use case is really an interesting combination of needing high performance for when the researcher says, "Okay, let me try training something now," as well as a really easy-to-use interface so that they can manipulate data and try things out.

Quick question: who here has run LS ever on a Linux host? I think most hands are up here as well. This is a very simple, super common command for researchers and honestly any end user to run against a file system. When you run LS, especially on a distributed file system, what you'll typically see is that running LS against a network-attached distributed file system can take a little bit longer than if you're running it against local storage.

The reason for it is just there's a lot of metadata that's returned as part of LS. There's file names, sizes, timestamps, and with a distributed system, under the covers, there are multiple servers that need to be contacted to collect all this information and display it to a user. What that means is that if you're comparing the performance of running LS on a local disk versus network-attached storage, especially for large directories with hundreds, thousands, tens of thousands of files in them, LS can take a little bit longer.

What we've heard talking to machine learning researchers is the difference maybe between a millisecond and half a second or so for an LS may not seem like a huge deal, but when you're an end user and you're going in and you're interacting with your file system, those milliseconds do actually matter, and it really translates to responsiveness of storage and quality of experience for the end users. So one of the enhancements that we launched two weeks ago that I'm really excited to talk about is we actually made some changes to Lustre to optimize the IO patterns for how LS works under the covers.

In doing so we have been able to reduce the time it takes to list the contents of a directory by up to 5X. This is something that we've done based on feedback from customers running ML research type workloads and home directory type workloads on FSx for Lustre, really doubling down on the ease of use that leads researchers to want to use a native file system for those data sets and making that even more responsive and quick. This is available starting a few weeks ago in the latest clients that we offer as part of FSx for Lustre. This is actually a change we've made to the Lustre client.

Let's talk about our last enhancement around TCO. If you're aware of FSx for NetApp ONTAP, you realize there are two basic tiers. There's a performance tier, which is a provisioned SSD tier, and there's an elastic tier which is what we call a capacity pool. You set up policies here such that as data ages or if you want to offload things like snapshots, you can have them automatically move from the hot tier to the cold tier and do it in an elastic way and significantly reduce cost. Now, in order to do this, what happens is the data first always lands on the SSD tier and then gets offloaded after one day, 30 days, 60 days, whatever your policy is, into the capacity pool.

What we've heard from customers is sometimes my data sets are larger than what I actually wanted to provision in the SSD tier initially, which is great because we can dynamically scale up the SSD pool. Conversely, I'm doing a big migration and I really want to scale up that SSD pool so that I can absorb this migration as fast as possible, and then as it drains off to the capacity tier, it's now optimized to the age or type of data that's there. Scale up flexibility has always been there, and what we released in August of this year is what we call the ability to decrease that SSD pool dynamically and online without customer application intervention. Now you can take that SSD pool, scale it up, use it, consume it, do a batch run around it if you're doing some sort of job or end of month processing, and then when you're done with that and you want to watch that data go into that capacity pool, you can rescale this back down again without taking any interruption to your applications or end users. This is a great feature that's available today in the Gen 2 FSx for ONTAP file systems, and if you're familiar with the ONTAP lingo, this is the equivalent of taking an aggregate and dynamically shrinking it down.

Doing More with Data: Amazon S3 Access Integration for FSx

Now let's talk about doing more with your data. We have a very rich ecosystem of services. We have everything from Bedrock and EMR that exist out there, and what we want to make available to you is to be able to use your file data to process that. This was a big announcement this week, and I'll give it over to Andy to explain this.

Cool, thanks Christian. As I mentioned earlier, a lot of the customers who we work with have existing enterprise file data that has historically been sitting on-premises, and one of the things that we're seeing with our FSx offerings, especially Windows File Server, NetApp ONTAP, and OpenZFS, is customers are able to easily lift and shift and move that data over to the cloud by having a like for like storage solution that makes it easy to do so without rearchitecting data. This has been great. Customers share with us that they love the ease of management that they're able to get by running on a managed service, the cost savings by having things like elastic storage options, and the agility benefits of being able to scale up and down with the click of a button.

As Christian mentioned earlier though, if you look at AWS, we have dozens and dozens of services that we offer that allow you to do more with your data. These are analytics services, AI services, and ML services. For example, Amazon Athena lets you query your data in place, and Bedrock and QuickSight let you create knowledge bases with your data. Customers tell us that they not only want to bring their file data into AWS to just make it simpler and lower costs for them to manage their data, but they also want to do a lot more with that data and really unlock a lot of the value that's contained within the data sets that they're bringing over.

There are a lot of cloud native services that AWS offers that I just mentioned a few examples of. There's also a number of cloud native applications and third party ISV products that are also out there. For the most part, a lot of these services and applications have historically been written to work really well with data that's stored in S3. That's kind of the native way that a lot of these applications and services work. We've heard pretty consistent feedback from customers. They want to do more with their data. They're really excited by a lot of these analytics services that Amazon offers, and they want to be able to use their file data really easily with those services without needing to copy between file and object and do things like that.

So earlier this year, what we actually launched was an integration between Amazon FSx and Amazon S3 that at a high level allows you to take an FSx file system and access the data in that file system through the S3 service as if your data were in an S3 bucket. The reason we launched this integration was to help customers easily unlock that data that you have in file storage and make it immediately accessible to this really broad set of services that are out there, applications that are out there, and let you do a lot of really cool cloud native stuff with that data.

Earlier this year in June, we launched this for the first time. We added Amazon S3 access to our FSx for OpenZFS service, and this week at re:Invent, we also added S3 access to our FSx for NetApp ONTAP service as well. So let me show you how this works a little bit beneath the covers. With an Amazon FSx file system, you can create this file system in your VPC. You can access your file system from compute instances using standard NFS or SMB clients. None of that changes with this launch. You can continue to do so. But there's a spectrum of analytics and AI and ML services that you may want to also use with that data.

The way that this works is we've integrated with a feature called an Amazon S3 access point, which at a very high level, the main thing I would share is that every access point has an alias associated with it. This alias is for all intents and purposes the same as a bucket name. Any service you use where you provide a bucket name or an S3 URI, a path, what you can do now is you can create an S3 access point that's attached to your FSx file system and then use the alias of that access point in place of anywhere where you would use an S3 bucket. And so this allows you to now plug in your FSx data with these services simply by treating that data as if it were stored in S3.

I do want to call out explicitly as part of this integration, your data continues to remain on your FSx file system. You can continue to read and write that data through NFS or SMB. None of that changes. Your data is not being copied to and from S3. What this is really doing is giving you S3 API access, the ability to access your data that is stored on FSx through the S3 API.

The other thing I would say is a number of our customers are migrating their data sets over to FSx. They're migrating their data, they're migrating their applications, and with this capability, you can now easily use that data with S3-based applications. Another really interesting use case we commonly see with FSx for NetApp ONTAP is customers maybe who have on-premises workloads, on-premises applications that aren't at a point where it makes sense to migrate them to the cloud, are commonly using FSx for ONTAP as a replacement for their disaster recovery site.

We often hear from customers that what they do is they have a primary site. Historically they've replicated to a NetApp system in a secondary site, and what they've done is they've actually been able to replace that secondary site with an FSx file system instead. And in some cases, they've even shut down that site and reduced their overall management burden. I've talked to a number of customers this week specifically about this new feature and talking through how they're thinking of using it, and a really cool aspect of this launch is that if you are replicating your data into FSx, if you're using FSx as a disaster recovery copy of your data, you can still use an access point with that DR copy and get a read-only view into your data and plug that into a bunch of these services as well.

So this is a really interesting way of getting started with data that even if you keep it on-premises, you can now replicate your data into FSx, use that as your DR copy, and now you can do a lot of really interesting analytics on that secondary copy of your data. As I mentioned earlier, there are about dozens of services that integrate with S3. I'm not going to go through each and every one of them. But one example I did want to walk through because this comes up I think most commonly in customer dialogues is creating a knowledge base from data.

Often we talk to storage administrators who say, I have a bunch of data that's in this on-premises NAS. My application users or end users are asking, how can I chat with my data? How can I get better insights into what data I have and use that to help me answer questions using Gen AI more effectively? And so with this launch, what that really means is if you have your data on FSx, the steps to doing so are you create an access point. You go to the Bedrock console and you say, I want to create a knowledge base with this S3 URI, again using the access point alias. You specify a chunking strategy, you specify an embedding model, and you choose a vector database which can also be Amazon S3 vectors, which was launched this week, and that's it.

Once that's done going through your data, you will have a knowledge base you can now use just like that without needing to copy your data between file and object. Your data can continue to live in your file systems where you have it stored. And on top of that, that's a very low cost way to get started with creating a knowledge base, especially when you incorporate something like S3 vectors in the mix because maybe you don't need that high throughput performance, but getting started is now hundreds of dollars, not thousands of dollars to go do something like that.

Just to extend that capability, talking about the power of S3 access points on file systems, we have a service called AWS Transfer Family. Surprisingly, there's a lot of protocols that were built 30 years ago that never seem to die. Things like SFTP and FTPS are the ways that businesses interact with each other, and they do it in secure and auditable ways. This could be market data, this could be regulatory data, this could be healthcare data. It turns out these services exist across businesses and are still the primary way in which businesses transact between each other for these common types of data.

We saw this trend and created a service called AWS Transfer Family. It's a managed service. AWS Transfer Family allows you to have, we carry all the security compliances, the certifications around it. You don't have to manage servers, you don't have to manage software, you can just be a consumer of the service, and it's been a fast growing service since its launch five years ago. We always extend the developer centric view in this, so you can create event driven automation. You can do it as code, we have malware detection hooks into it so that you can scan stuff as it's coming in.

Playing that forward a little bit, today or prior to this S3 access points launch, it was either going to go to EFS or to S3 as your primary target for Transfer Family. With the launch of S3 access points, now we can extend the reach, the integration for customers to be able to use different file systems to be able to do this. One question is, well, you've got two, why do you need a third? Well, it turns out because customers are using FSx for specific needs. We had a financial services company customer who was using FSx for their market research data, and they wanted to enhance it with third party data and bring it right into their file systems for their researchers to be able to use and derive new insights from. This makes it really easy to extend that paradigm.

Building Resilience: Advanced Data Protection and Disaster Recovery Features

Resilience, very important topic. In today's threat landscape, we're commonly hearing from customers that data protection is no longer optional, it's really essential. This is both a key focus area for the customers we talk to and it's a huge focus area for us as well on the service team. We want to make it easy for customers to have defense in depth, multi-layered defense strategies to protect against a variety of different threats that may be out there. Today I want to talk about some of the recent launches we've had to make it even easier for you to protect your data in the space.

The first of these is on our FSx for Windows File Server file system. Just two weeks ago, we added support for a Windows Server feature called File Server Resource Manager, or FSRM. This is a commonly used feature for customers who use Windows Server to manage their data, and it offers a number of different storage and data management and protection capabilities. For example, it allows you to set quotas on different folders in your file system. It allows you to classify the files in your file system based on their contents, based on other properties. It also lets you screen which files can be written to your file system.

For example, you could set up a rule to say I don't want .EXEs stored in this particular file system. You can also generate storage reports of the kinds of data you have in your file system to give you deeper insights into the data that you're storing on FSx for Windows. Lastly, even though FSRM is a Windows Server feature, as part of bringing it to FSx for Windows File Server in the cloud, we've tried to make this as cloud native as possible. To do so, we've actually integrated FSRM on FSx for Windows File Server with Kinesis and with CloudWatch logs so that you can get FSRM logs and events that are generated directly in these cloud native applications.

Another really popular feature we're increasingly hearing customers talking about is an anti-ransomware protection, or ARP, that's available on NetApp ONTAP. This is a capability that we brought to FSx for ONTAP in April, and the reason we launched this feature is if you think about ransomware attacks, unfortunately when we talk to customers about events they've had, sometimes by the time you realize that an attack has occurred, it's too late or there's some portion of your data that it's really difficult to get back.

So the goal of ARP, this feature in ONTAP, is what ARP does is it's actually, once you enable it, it automatically scans the changes that you're making to your file system and is trying to detect if anything suspicious is going on. And if it does so, it automatically takes a response on your file system to reduce the time that it takes to protect yourself and be able to restore your data.

So for example, if it detects that you're creating a whole bunch of encrypted files in the file system, it may automatically create a snapshot. This will actually be a locked snapshot that can't easily be deleted of your data and can be set up to generate alerts and create additional evidence, giving you more information into what occurred. Based on that, you can take a look at what occurred. You can restore that snapshot and go back to a previous state if it were a true attack. But again, the goal here is really to reduce the time that it takes between something occurring on your system and your ability to respond, providing an additional layer of protection to really help against some of these ransomware attacks.

Another capability in a similar vein is SnapLock. So SnapLock is a write once, read many, or WORM feature that we brought to FSx for ONTAP a few years ago. It's used really to lock the contents of a particular volume so that those contents cannot be changed over time. We see SnapLock commonly used by customers in highly regulated industries such as financial services or healthcare, and also just by customers who are looking for an additional way to protect their data. We commonly hear SnapLock come up in the context of having an additional ransomware protection strategy.

We offer two forms of SnapLock on FSx for ONTAP. There's a compliance mode that is most commonly used by customers in highly regulated industries where once you lock a piece of data up until that data retention expires, there's really no way to change it. And there's also an enterprise mode where there is a way for an administrator to go in and unlock data if need be, and that's really meant for more internal governance reasons. It's a really commonly used feature on FSx for ONTAP, and it's often used as part of a ransomware protection story.

Earlier this year, we actually made SnapLock completely free on FSx for ONTAP. This had originally been a licensed feature on ONTAP, so in addition to how much you're paying for your storage, you would pay an additional license fee for the ability to use this feature. And when we were talking to customers, we would often hear that having an additional copy of the data and protecting the data is super important, and we really just want to make it easy for customers to be able to do so without having to, sometimes customers because of these fees would maybe copy their data into another storage solution and try to build complex architectures to meet their compliance requirements as well as their TCO requirements. And so by removing SnapLock charges on FSx for ONTAP, what we're really allowing customers to do is just have your data in your file system, lock the data you need, don't lock the data you don't need. But this really, our goal with making this free is to simplify the data protection architecture for customers who want to be able to have an immutable copy of their data.

Let's talk about, I'm going to shift gears again to talk about AWS Backup. AWS Backup is a native AWS service that protects all of our stateful services within AWS. It also has the capabilities to protect EKS, which we just announced this last week for stateful containers. We released a feature over a year ago called Logically Air-gapped Vaults, and the keyword there is logical because you can never really shut off the network to your data in AWS. But the main premise here is allowing you to back up data to a special vault that has a different permission set and access set such that the data is immutable. And it's immutable to all the various threats, right? Like a root user can't delete the data, you need special permissions to restore the data, you can restore the data to other accounts through the Resource Access Manager if an account is compromised, and it isolates the data from your core accounts that you're protecting in the first place. This has been a popular feature for that defense in depth conversation to make sure you can then use that data in the event of doing some sort of forensic analysis into another isolated account to see if something happened and when it happened without having to worry about that account having the data protection data backups being compromised.

There's also some enhancements we had in this too. Integration with GuardDuty was announced this week, so you now can actually scan that data as it's going into the vault. What we announced here though is more of a TCO play with logically air-gapped vaults.

The main way that this worked before was when you backed up your data, you needed to create first a local copy of the backup, and then it would replicate a copy of that data into a logically air-gapped vault as a second copy, which means you had two copies of data. With logically air-gapped vaults, we now support direct to logically air-gapped vaults, so you can eliminate one of those copies of data and write the data directly to logically air-gapped vault. This works for all of our EFS, S3, and FSx file systems except for ONTAP, but ONTAP has all that richness of those other features which provides that same sort of defense in depth capabilities.

Additionally, this year, around the same time we announced some of the enhancements to access points, we also announced Amazon EFS cross-account replication. Backup is one thing, and you're backing up to a vault, for example, but you have to go through a restore process in order to activate that data again into another account to be able to use it. That's great for a lot of customers where the RTO and RPO meets their needs. Some customers want to say I need to fail over faster, I need to have almost a standby copy in another region ready to go in the event we declare a disaster.

One of the requests we got is it's not just about being able to replicate from region to region, which we've supported now for over a year. I need to be able to do that in a secure way that separates the permission structures between accounts so that I know if something happens and I declare a disaster, it could be an outage or an application issue, but it could also be a compromised account. I want to ensure that whatever is happening here, nobody can have happen over here in the secondary account. So in February this year, we announced cross-account replication, which means you can physically isolate those permission structures between these two accounts and ensure you have a different kind of permissioning model in the secondary side, and you can still do the same things. You can fail over, you can test, and you can fail back your data back to the primary. So if you ever do any DR scenario planning, you have the flexibility to go do it.

Simplifying VMware Migrations with Amazon Elastic VMware Service and FSx for ONTAP

Lastly, this is our last topic. We've got a couple of minutes left here. We're going to talk about simplifying data migrations. We realized there's a lot of conversations going on about what do I do with my VMware environment on-prem and how do we migrate that to AWS, or how do we set ourselves up for the ability to migrate and then potentially modernize later down the road. When you're moving these workloads, it can be a complex endeavor that can take years, and so what we're trying to do is make it easy for you to get over to AWS as fast as possible.

If you followed along on the announcements, we announced the service called Amazon Elastic VMware Service, which is the fastest path to going from VMware to VMware. Your IP addressing can stay the same, your networking access can stay the same, your permissioning can all stay the same, and so it's really easy to bring that environment over. What we're hearing from customers with the launch of VMC back in the day was, yes, but I have my enterprise storage that I use on-prem, my NetApp or my other enterprise storage, and I want those same capabilities when I move over to AWS.

What we're happy to announce is that when Amazon Elastic VMware Service launched, we announced support for Amazon FSx for ONTAP immediately with that release, which means that now you can preserve that same operating model plus get all the benefits of ONTAP. This includes things like I want to be able to clone VMDKs fast and instantly in as little space as possible. I want to be able to dedupe all my VMDKs so that they store in the smallest space as possible. I want to be able to replicate from on-premises to cloud to start a DR scenario, and then eventually maybe I'll cut over the migration and bring all my VMs up on the far side if I'm not using VMware native tools. So you get a lot of that enterprise capabilities with managing your VM environment in a simple and easy to use way.

We support it out of the gate in four different access modes, which is you can use NFS for those of you that are using NFS data stores. You can use it as an NFS data store. You can attach it as a block store via iSCSI, and then you can also connect via NFS or SMB directly to VMs if you have datasets that you want to just process through those VMs directly.

It's really easy and simple to get up and running. You have support for this, and it's particularly helpful if you have data or workloads that are more data-heavy than compute-heavy. It really balances the TCO of the overall solution.

FlexCache Write-Back Capability: Enabling Global Distributed Workloads

Last but not least, FlexCache. For those of you who are not familiar with it, FlexCache is a really popular NetApp ONTAP feature, and what it allows you to do is if you have two ONTAP systems, you can configure one of those to be a local cache for the other. We commonly see FlexCache used in a variety of different ways. For example, maybe you have an origin on-premises, you have on-premises data, and we see customers creating an FSx file system in the cloud and using that as an in-cloud cache for on-premises datasets. We commonly see the reverse as well, where customers will migrate their data to FSx. They'll have some use cases where you need low-latency access from on-premises, and they'll use an on-premises NetApp system as a cache for that cloud data.

Lastly, we see really interesting use cases where customers are completely in the cloud, or maybe they're in a mix of cloud and on-premises, but they have geographically distributed teams that all need to work on the same data. We have customers who are replicating between the US and APJ, and they have teams spread across the globe where having local caches where some of these distributed remote teams are allows everyone to see the same data and also have pretty fast performance of that data regardless of where they are.

Historically with FlexCache, FlexCache was what we call a write-around cache for data. What this means is if you go to read data, if the data is not on the cache, it'll synchronously go to the origin, put it into the cache, and then serve it to your clients. On the write path, however, it would also do something similar. If you have a client, again in this example we're talking about an FSx file system in the cloud and an on-premises origin, if you want to write to data using a write-around cache, what really happens under the covers is your writes have to go to your on-premises origin first before they can be acknowledged to the client.

What that means is for workloads that are read-heavy, that's perfectly fine, but for workloads where you're actually looking to write data, that latency of needing to go all the way back to the origin, especially if the origin and the cache are in pretty different parts of the globe, could be higher. It could be tens or hundreds of milliseconds depending on how the networking is set up. A capability that ONTAP released and that we brought to FSx for ONTAP earlier this year is a write-back cache feature for FlexCache.

What write-back cache allows you to do is, if you're reading data, reads work the exact same way. Cache hits get served out of the cache, cache misses go back to the origin. But with write-back caching, if you enable this capability, what it means is if you write data into the cache, that cache will actually take out the locks that it needs to from the origin to make sure there's data consistency, but it will hold on to those writes and allow those clients to get much faster write performance. The cache will flush the data as needed back to the origin asynchronously from what the clients are doing.

What this capability allows you to do is get much faster write performance for some of these workloads because your writes now don't need to go all the way back to the origin, which again could be on the other side of the globe from the cache. Instead, your writes can go directly from your client to the local cache without needing to traverse potentially across an ocean. As I mentioned earlier, this is a feature we brought to FSx for ONTAP earlier this year in May, and we're seeing a lot of customer excitement and interest in using this, especially around this kind of global distributed namespace type of use case.

We're really commonly seeing cases where folks have teams across the world that need access to the same core dataset, but everyone wants low-latency access to the data as well. This is a capability that makes that much more possible because you can have these local caches where you're both able to get fast read performance as well as fast write performance. With that, I want to say thank you to everyone for your time, for attending the session. Christian and I are going to stick around if folks have any questions afterwards, but thank you and put the middle.

; This article is entirely auto-generated using Amazon Bedrock.