🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.
Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!
Overview
📖 AWS re:Invent 2025 - Amazon EKS Auto Mode: Evolving Kubernetes ops to enable innovation (CNS354)
In this video, AWS introduces Amazon EKS Auto Mode, demonstrating how it fundamentally simplifies Kubernetes operations by managing compute, storage, and networking infrastructure. The session features live demos deploying applications from scratch, including a 20-billion parameter LLM model on GPU instances, showcasing automated node scaling via Karpenter, built-in NVIDIA support, and SOCI parallel pull reducing container startup times by 60%. Capital One's Dan Levine shares their enterprise adoption journey, highlighting how Auto Mode eliminated infrastructure management pain and reduced support tickets to "golden silence." Key features covered include expanded regional availability including GovCloud, advanced networking configurations, custom KMS encryption, FIPS compliance, and static capacity options. The presentation emphasizes Auto Mode's shift in the shared responsibility model, where AWS manages cluster capabilities, OS patching, and node lifecycle while maintaining full Kubernetes conformance.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
Opening Demo: Deploying Applications on Amazon EKS Auto Mode in Minutes
Alright folks, hello and welcome. Today we're going to be talking about Amazon EKS Auto Mode. I know a lot of you are probably familiar with re:Invent sessions where they spend 45 minutes telling you how great something is and then 5 minutes showing you how it works. We're going to flip the script a little bit and start with the demo, showing you what's possible with Amazon EKS Auto Mode.
My name is Sai Vennam. I'm a Principal Solutions Architect at AWS. I'm joined by Alex Kestner, who's the Principal Product Manager and the lead PM for Amazon EKS Auto Mode, and we also have a guest who'll join us on stage here in a little bit from Capital One as well. So stay tuned for that, folks.
Like I said, I do want to go ahead and start with a demo. I want to show you folks what's actually possible with Amazon EKS Auto Mode. We're going to start here with our familiar Amazon EKS dashboard, but don't worry folks, we're going to go to the terminal here pretty soon. I just want to show you what we're starting with. We're going to make this just a little bit bigger first. We're going to go to clusters. I have a cluster that I created not too long ago, just to show you folks, there's nothing in the cluster right now. It's completely empty, so zero pods, zero nodes. But I promise you it's production ready, and I think the interesting thing about this is that generally with any sort of EKS cluster, when you run kubectl get pods in all namespaces, you expect to see some things in there, anything for things like pod-to-pod networking, maybe kube-proxy, maybe the CNI, the Container Network Interface. But with an Auto Mode cluster, you're going to see nothing in there, but I promise you it is production ready.
I think the first hint for knowing whether it's production ready is by looking at the Custom Resource Definitions in the cluster. Here's kind of that hint. You'll see things in here for like node claims and node pools. That's the underlying open source Karpenter project that gives us dynamic node autoscaling. You'll see some indication here that we can create ingresses and load balancers. Essentially, our cluster has these components managed in the control plane to enable it to be ready to serve real workloads. But they're not running in our data plane, so no nodes, no pods, nothing that we're responsible for, nothing that we're paying for, and most importantly, nothing that we need to manage when it comes to things like cluster upgrades.
Let's do something a bit more interesting. I'm going to clear this out and then in the bottom terminal load up EKS Node Viewer. This is just a tool to let me see what nodes are in my cluster. Again, no nodes. Over here I want to apply a very simple retail store sample application, just the UI component. You know there's no nodes, right? But in just a matter of a few seconds, you saw Karpenter pick it up, and we saw that it picked a c6a.large instance. It's not ready yet. It's starting the node. Any second now it's at 16 seconds, 17 seconds. You can see on the bottom right, the node started, networked into our Kubernetes cluster. It picked a c6a.large, as you can see, and we have one pod still pending to start but scheduled on that node, right? In a second here, it should switch to running. Boom, there we go.
Let's do something a bit more interesting. We're going to scale this deployment up to 10 replicas now. I don't think all 10 replicas are going to fit in that first node. Nope, see? So it fits three in that first node, and then to fit the remaining seven, it's going to spin up a c6a.xlarge. By the way folks, it's deciding the nodes for me. I didn't script this ahead of time. I didn't tell it what to do. It's going to pick the right node for the right set of workloads. Boom, there we go. The node is ready. The pods are scheduled, and in a few seconds here we should see them all flip to running. So there you go. You can see all 10 running, bound on the two nodes, and the deployment scaled up. By the way, it's also working on spinning up an Application Load Balancer for me, but that takes a couple of minutes, and I'm trying to show you the full demo in 5 minutes. So let's clear this out and port forward the service that we've created here. Then once that's ready, switch to a browser.
Here we go, and we'll go to localhost 8080. And there we go. This is our Retail Store Sample Application. Pretty basic, just a front-end UI application. We did scale it 10 times, and so you saw with the fresh EKS Auto Mode cluster, we were able to start serving traffic. We were able to dynamically scale the nodes where the workloads work in a matter of minutes, and I think that's really the idea here. We want to show you the rapid pace of how quickly you can start getting workloads going. So that's my quick 5-minute demo. I have a deeper dive demo coming here in a little bit. This was the what of how things work, but for us to start understanding the why, I want to pass it to Alex, our lead PM on EKS Auto Mode.
He's going to talk a little bit about how we got here. Take us back to the beginning. Off to you, Alex.
The Evolution of Kubernetes and Amazon EKS: From Complexity to Simplicity
Awesome, thanks. Thanks everyone for joining us today. Really excited to be with you. I think that it's so exciting to see how quickly you can get started up and running with EKS Auto Mode today, but it's also important to understand why customers choose Kubernetes and how EKS evolved to get where we are today with Auto Mode.
Stepping back, Kubernetes was created to help customers manage the increasing complexity of the cloud. Today it's the leading cloud operating model, and its popularity is only increasing. In 2024, 93% of organizations, up from 84% last year, surveyed by the Cloud Native Computing Foundation, are either using Kubernetes in production or actively evaluating it for their organization. Without a doubt, Kubernetes is the de facto standard for operating in the cloud.
This is because Kubernetes is incredibly useful. It has a relatively simple set of APIs for managing large groups of servers and coordinating how your applications run across them. Because it was both born in the era and from the complexities of the cloud, it also prioritized consistency. After all, what good is simplicity if there's different versions of simple for different applications, let alone all the different places your applications need to run.
And while Kubernetes itself covers the vast majority of the functionality that you need to build, deploy, scale, and operate any kind of exciting new application you can think of, it's also extensible, which makes it incredibly powerful. There are currently 195 open source projects managed under the Cloud Native Computing Foundation and hundreds more landscape projects which run on, integrate with, or extend Kubernetes. You can even write your own.
Okay, so Kubernetes is great. I think you all know that, that's why you're here. Actually, using Kubernetes is great, but running and operating Kubernetes clusters can be quite hard. So this is why eight years ago, right here at re:Invent in fact in 2017, we launched Amazon EKS. This was in response to feedback we got from customers that managing Kubernetes at scale was hard. They had to monitor, scale, and manage the Kubernetes control plane to meet their requirements for security and resiliency, let alone find ways to integrate their clusters with other AWS services that their applications needed.
Since then, Amazon EKS has emerged as the most trusted way to run Kubernetes, offloading the undifferentiated heavy lifting of hosting the Kubernetes control plane to AWS while remaining fully Kubernetes conformant. This allows customers to focus on running their applications instead of managing their cluster's control plane. And we didn't stop there. Over the last eight years, we've been consistently busy delivering new features and enhancements for not only EKS, the AWS service, but also the broader Kubernetes open source community.
We started with this basic managed control plane and have steadily added capabilities for compute management, auxiliary software, security, scalability, networking, observability, troubleshooting, and even recently this year, EKS capabilities itself, a new launch that you should certainly check out as it's very exciting for managing operational software on top of the cluster. With more than 250 launches over the last seven to eight years, from expanding into additional AWS regions, reducing pricing, introducing open source projects, and more, we've been very busy.
We've even created brand new open source projects like Karpenter that powers Auto Mode, Kro, and ACK that have become industry standards. We've gone deep into every aspect of operating Kubernetes in what we think might be the largest scale anywhere. Which is why we now run tens of millions of clusters for customers every year, and this keeps growing.
But when we spoke with some of our customers running these tens of millions of clusters, they told us that while EKS made managing clusters easier, there was still more that we could do to make deploying and operating Kubernetes applications easier, which is why last year at re:Invent we launched EKS Auto Mode. Auto Mode is not only a major evolution for easily running production-ready Kubernetes clusters, it fulfills our vision of how we think Kubernetes should operate in the cloud.
What is EKS Auto Mode: AWS-Managed Compute, Storage, and Networking
Auto Mode provides Kubernetes conformant and AWS managed compute, networking, and storage for any new or existing EKS cluster. This makes it easier for you to leverage the security, availability, scalability, and operational excellence of AWS for your Kubernetes applications. Auto Mode allows you to create application-ready clusters like the one that you just saw from Sai at the beginning of this talk, preconfigured with the essential capabilities and best practices from our eight years of running tens of millions of clusters.
It dynamically scales and cost optimizes cluster compute based on your application's needs. It selects, provisions, secures, and upgrades AWS managed EC2 instances within your account using AWS controlled access and lifecycle management. And it handles operating system patches, health monitoring, updates, and limits security risks with ephemeral compute, strengthening your security posture by default.
It also meaningfully simplifies cluster upgrades, reducing the operational overhead of running Kubernetes clusters and enabling you to focus on the activities that are critical to your business instead of maintaining infrastructure. All of these features help reduce the time it takes to launch new applications and allow your organization to get new products or modernized applications to market faster. Because Auto Mode is Kubernetes conformant, it means you're able to do so while continuing to take advantage of that vast ecosystem of open source tools and software.
Auto Mode helps improve your application's availability as well. By managing the infrastructure where your applications are running, you're better able to meet the needs of your users and organization. Finally, it does all of this while automatically optimizing for cost efficiency, both of the compute and your team's time. Reducing the operational overhead of running Kubernetes applications enables you to focus on, again, those things that are critical to your business.
I've talked a lot about the work that it takes to manage all of these Kubernetes clusters and the infrastructure behind them. Let's take a look at what that actually looked like, practically speaking, before EKS Auto Mode. Customers had to provision a cluster, install all the essential Kubernetes plugins required to run production-grade applications, select, configure, and launch the best compute for those applications, and finally, with those pieces in place, they could finally deploy the applications into their cluster. But this is just the beginning of actually running these applications.
With the applications deployed, you now have to continuously monitor all of this infrastructure and software, repairing it when unexpected issues inevitably arise, and upgrading it as new Kubernetes versions or operating system patches are released so they can remain up to date, compliant, and secure. All of this was operational work that you had to take on, which was reflected in the architecture of a non-Auto Mode cluster. So let's come back and look at what this looks like with EKS Auto Mode.
With just one click or API call, Auto Mode provides a fully automated, AWS-managed, Kubernetes-conformant compute, storage, and networking for any EKS cluster. Once you've created a cluster with Auto Mode enabled, there's nothing else you need to do before you can begin deploying applications like you saw from Sai. Auto Mode automatically provides the infrastructure your applications need, scales resources to meet your application's changing demand, automatically optimizes for cost, and monitors and repairs nodes as needed. All of this eliminates a ton of operational work so that you and your teams can focus on building applications that drive innovation.
Architecture and Shared Responsibility Model: How Auto Mode Changes the Game
The other way you can think about this is what does the architecture of a cluster look like both without Auto Mode and with Auto Mode. Here's what it looks like for a cluster without Auto Mode. The classic AWS-managed cluster control plane is on the left in an AWS account, and all of the other Kubernetes plugins, software, infrastructure provisioning, and other AWS resources that your applications need are on the right in your account.
With Auto Mode, this changes quite a bit. You can see that not only does AWS run the cluster control plane as we always have, we also now run a set of essential and integrated Kubernetes capabilities for compute, storage, and networking. Straddling our and our customers' accounts, Auto Mode also launches EC2 Managed Instances, a new operating model for EC2 shared by similar features in ECS and Lambda, just announced even this year. This allows you to delegate operational responsibility for the instances themselves to an AWS service like EKS with Auto Mode, while they continue to reside in your account and function otherwise like normal EC2 instances. Other supporting infrastructure resources like load balancers and EBS volumes continue to exist in your account as they have since EKS launched.
One other way that we like to think about how the balance of what each of us is responsible for changes with Auto Mode is with the shared responsibility model. This shows what part of an architecture is our responsibility and what part a customer is responsible for. With standard EKS clusters, we're responsible for all of the AWS global infrastructure, foundational services, as well as the cluster control plane. Everything above that in this diagram, from cluster capabilities to compute and compute lifecycle management, operating system patching, monitoring, and repairing, all of these are fundamentally our customers' responsibility. While some of the 250+ features we've launched over the last eight years have helped make managing parts of this easier, they're ultimately in your sphere of responsibility to manage.
With Auto Mode, we move this boundary of the shared responsibility model considerably. We take on far more of the undifferentiated heavy lifting, including all cluster capabilities for compute, storage, and networking, but also operating system patching, monitoring, health, and repairing the EC2 instances where applications actually run. This lets you focus on your application, your application security, how it's monitored, how it's operating, and any other essential plugins or add-ons that you might need. This lets you focus on the things that are essential to your business as opposed to managing infrastructure, of course.
Deep Dive Demo: Deploying GPU-Based LLM Workloads with Auto Mode
It's one thing for me to tell you about how great Auto Mode is, but it's another to see it yourself, which is why I want to hand it back over to Sai to go a little bit deeper in his next demo. Alright, let's do it, folks. Thanks for that, Alex, and by the way, these last two slides, I just like flipping back and forth between them really fast, and then you can see just how much we'll manage for you with Auto Mode, right? The operational overhead that I couldn't offload was figuring out how to do that in PowerPoint. There you go. That's why I'm here. That's why I'm here, Alex.
Alright, awesome. Well, it's time to flip into a demo. Let's see Auto Mode in action, folks. I gave you a quick five minute, just very quick flow and how all of this stuff works, but how about we jump back to where we were. So essentially we had deployed the UI application. We had scaled it up, but I want to do something a bit more interesting. I don't think any of you folks have seen any other speakers talk about Gen AI at this conference, right? No. Well, let me be your first.
We want to show you folks a real meaty LLM model deployed in EKS Auto Mode from scratch on an NVIDIA-based instance. We want to show how accelerated compute and this kind of GPU support comes batteries included in an Auto Mode cluster and hopefully it'll give you a good sense for how much we're actually trying and helping you manage on the operational side of the world. So first I'm going to downscale these. So let's go ahead and run a scale command, and I'll just take this deployment down to zero replicas. By the way, that autocomplete is courtesy of Kiro CLI. Check it out.
Okay, and just like that you can see in a matter of seconds, honestly instantly those nodes are now at zero percent utilization. They're underutilized, so what do we expect to happen? Well, you'll see for yourself. Next what we want to do here is apply a node pool that is going to be our kind of accelerated compute here. Okay, what do I mean? Essentially, well, if we want to run an LLM, we can't just run it on basic CPU memory-based instances. We need a proper GPU attached, and we want an NVIDIA GPU.
So let me open up Kiro again, my favorite code editor. Of course, and we're going to go to the GPU node pool. Let me scroll down and just let me show you folks how you configure a node pool before we kind of took it for granted, there's a general purpose node pool that comes with every EKS Auto Mode cluster that's going to let us deploy general purpose workloads. But if we want to do something more interesting and most of the time you're going to want to do something custom for your workloads, you're going to want to customize the type of nodes you deploy.
And you may remember with managed node groups you would have to kind of explicitly list out all the type of instances you want to use and it configures an Auto Scaling Group, and you know, it integrates with Cluster Autoscaler and there's just a lot going on there with Karpenter, the underlying open source project that powers the dynamic node auto scaling in EKS Auto Mode. Instead of, you know, building up the list of instances, we filter down based on our requirements. We tell it we want instances that are either spot or on-demand, and if we're lucky today, we'll get a spot instance which is major cost savings on G6 G5 instances.
We also want it to be in the G class, so G5, G6, greater than instance generation four again, so we get those two that we're looking for, and of course within those instances there's large, extra large, four extra large, so on and so forth, and we give it an architecture as well. So we're telling Karpenter, here's my requirements, figure out what I want, right? Alright, let's keep going here. I'm going to go back to the CLI and actually apply the deployment.
So recently OpenAI open sourced their GPT models they call the GPT OSS. I'm going to deploy the 20 billion parameter model which does require a pretty good NVIDIA instance GPU. So as soon as I created that, you can see here about eight seconds ago it decided to spin up a G5.4xlarge on, oh hey, by the way, take a look at these two instances that we had scaled down from earlier you can see they're already scaling down those nodes are being deleted because we're consolidating and by the way, if we scale to like three instances rather than zero, it would probably have removed one of the instances, okay.
Batteries Included: Creating Clusters and Running AI Models with Minimal Configuration
This 20 billion parameter model is 14 gigs, so while it decides to pull the image and take some time, we're going to switch gears here and take you to the console. We're going to go back to that in a minute here and earlier I cheated a bit and we started with an existing Auto Mode cluster. It was version 1.33. I do want to show you how we create a cluster, but
over here if we go down to the version of the cluster itself, we can see we can upgrade it to 1.34 and also there's a create backup button here. So one important thing to understand here as a customer or consumer of Auto Mode, you get to decide when to upgrade the control plane. We won't take that over for you. This is a major operation and you should be in control of that. But we do control the node upgrades and all the components that Alex showed earlier. Maybe I can just quickly jump back to that slide and show you folks all those components that might be running inside a cluster.
We'll upgrade those for you as well. You don't have to worry about the worker nodes. Next time they're restarted, they'll be in the latest version as well. This makes your life easier and takes some of that undifferentiated work and gives it to us. I do want to quickly call out this create backup button. Just a few weeks ago, maybe a month ago, we announced support for Amazon EKS and AWS Backup. So now with the click of a button you can back up your clusters, which we tell you it might be a good idea to do before you upgrade your cluster. It's up to you, but it's an irreversible action to upgrade the control plane version of your cluster. So you might want to create a backup and with AWS Backup it's that much easier. Just a quick call out.
Okay, let's get back into it. So this is our existing cluster, but I want to show you folks how to create a fresh cluster. So we'll go to clusters, hit create cluster, and we have this relatively new quick configuration flow with EKS Auto Mode clusters. You can go to custom configuration and set everything as well, but we wanted to show you, yes, buzzword incoming, a one click deploy option for Auto Mode. Let's zoom in here a little bit and just go through the options. We'll scroll down. We can randomize a name for the cluster, so serious pop crab. I like it. Let's scroll down, pick a version, and then look, I know Kubernetes SRE experts that still stumble around creating cluster roles and node roles. We made this easy for you. Hit create recommended role. You click, click, click, and we'll make one for you. In this case, I happen to have a couple of roles in the account that I can already use. Makes life easier. I know a lot of you probably use eksctl out there. This does that for you. Well, now we do that for you in the console as well.
Okay, let's scroll down here. We have the VPC that's preloaded and the subnets as well, but I actually want to dive into something a bit more interesting here. If we go to this quick configuration defaults, there's a section here. Let me just zoom in here so you can see it a little bit better. Okay, so essentially with any Auto Mode cluster these are the components you get out of the box and again I know we've talked about this before, but I really want to drive this in. So application load balancing, when we create a service and we need load balancers, application load balancers or network load balancers, we'll look at the Kubernetes objects you deploy and create the kind of corresponding resources in the AWS backend. That comes out of the box. Block storage, the EBS CSI driver, Container Storage Interface, just the way that Kubernetes is able to help you create backend storage for stateful workloads, that comes out of the box as well.
Compute auto scaling, I think you all know what this is. I said it like five times already. It's Karpenter. It's an underlying open source project that we open sourced a few years ago. We'll handle that for you as well. And folks, Karpenter, if you've ever set it up before, there's some roles you've got to set up. You have to deploy maybe a managed node group for it. You don't have to. We're going to run that in our control plane. GPU support, that means batteries included. Again, I said that before. Essentially any time, if you didn't leverage Auto Mode, you're doing this thing where you have to line up the kernel version of your Amazon Machine Image and the instance to the device driver version for NVIDIA. You also have to install the NVIDIA device plugin. And then maybe you want tools like the DCGM NVIDIA exporter if you're familiar with that. It gives NVIDIA logs to Prometheus. I think that's Data Center GPU Manager, DCGM exporter. That comes out of the box for you.
And then these last couple of bullet points, kube-proxy, VPC CNI, CoreDNS. I know it's like a ton of acronyms. Just know it's like pod to pod networking. It's the networking that any Kubernetes cluster needs. We handle that for you as well. That means not just that you don't have to worry about the nodes they run on, also that we'll upgrade them for you. So those are all the components that come with it, and we'll scroll back down here and we can hit create. And so now you see what I had to do to prepare the cluster for the five minute demo that I had done earlier. You hit create and you're good to go. Okay, now hopefully I've given it enough time.
It looks like that pod has been scheduled on that g5.4xlarge from earlier. Remember, this is the GPT OSS 20 billion parameter model. We're just going to describe pods. There's only one, so it'll come back very quickly, and I want to call something out here. We successfully pulled the GPT OSS model from my private ECR. I put it in the ECR before this talk. It's 14 gigs, right? You can see that. I think it's 14 billion bytes, and it was able to pull that in just over a minute.
Quick tech recap. A couple of years ago, AWS open sourced a project called SOCI. Back then it was for lazy loading, but later on we released a capability called Parallel Pull. Essentially, it allows you to pull multiple layers of a Docker image concurrently. Now with generative AI, it's extremely important that time to first token should be as low as possible. So pulling a 14 billion byte model, right, 20 billion parameter, 14 gigs in a minute, well, before SOCI that would take 2 to 3 minutes. So that's already such a dramatic improvement. With Auto Mode, we can configure that for you out of the box. On these G instances, for example, we get that parallel pull offered by SOCI out of the box, ready to go. And this really shows you the kinds of things that I didn't have to do to make this demo work.
I mean, you guys saw it. It was a fresh Auto Mode cluster. I applied the configuration, essentially a deployment that had one deployment and one service, and it integrated with the NVIDIA GPU drivers. It gives me the logs that are streaming to Prometheus. I have SOCI parallel pull support out of the box. All of these things that I didn't have to do are handled for me, and that's what I mean by batteries included. Another way to think about it is the infrastructure for Auto Mode is GPU aware.
Okay, I think we've given enough time. By the way, after the model pulls, it then needs to start it, which takes a couple of minutes. I think we've given it enough time. Let's switch to another tab here. We're just going to port forward that service that we created earlier, the GPT OSS service. Boom, there we go. Now let's ask it a simple question. I have a copy paste here. It's going to curl that localhost 8000, ask it what is machine learning, and I piped it to jq just to make the pretty output.
Okay, taking a look here, we can see it's a thinking model. So the reasoning here is what is machine learning. They expect a concise explanation, and there we go. There's a response. So essentially, what did we show here? Essentially, it's infrastructure that moves at the speed of your ideas, right? We wanted to run an LLM inference workload. We have a deployment. We wanted GPU based instances. We first created a node pool. We told the node pool the type of nodes that could support this workload, and then we created the deployment itself.
And really, when I was thinking about how to make an exciting demo for Auto Mode, it was really showcasing all the things I didn't have to do. It was the streamlined experience, and so hopefully I got the point across that it really is that easy to deploy even the most meaty, production ready, like a 14 gigabyte LLM model in a matter of minutes.
Capital One's Journey: From Fragmented Kubernetes to Centralized EKS Auto Mode
Okay, so I know a lot of you have the question on your mind right now. This is all great for a demo, but what about for a regulated, high scale environment? Well, to talk about that, I would love to introduce Dan Levine from Capital One to talk a little bit about how they started using Amazon EKS Auto Mode and how they got to where they are today. Here you go, Dan.
Hey folks, thank you very much, Sai, for the introduction. My name's Dan Levine. I've been working at Capital One for about 8.5 years now, and my main focus has been the implementation and maintainability of Kubernetes at scale. Like Sai said, Capital One is a highly compliant, highly regulated company that needs things to be risk averse. In order to do that, we need certain assurances to use a technology like EKS Auto. So I'm going to talk about a few things today. I think the thing that might be most poignant for all of you is a little bit about our journey.
We didn't get here overnight. There's been 10 years of buildup since Kubernetes has been released that led us to the decisions to adopt EKS Auto Mode. I also want to talk about some of the solutions that we've come up along the way to make it more palatable to be used in an enterprise such as ours. Of course, this is an EKS Auto talk.
I want to talk a little bit about EKS Auto Mode, the benefits that we've received from it within our enterprise, and how our users are using it. Like all good speeches, let me take you back 10 years to when Kubernetes was first born. At that time, a lot of different platform teams within Capital One were trying to implement Kubernetes. We were bootstrapping etcd, we were managing control planes, and we were doing all of this in several different areas of the business. All of these platform teams that thought Kubernetes was the right use case for them had different opinions about how to do all of these things. It was a challenge that created a lot of arbitrary uniqueness within our enterprise, and we weren't able to solve a lot of problems quickly.
The three main challenges that we hit because of this were high SRE effort, constant churn, and scalability issues. High SRE effort because everyone was doing something a little bit differently, and they were spending a lot of hours trying to figure out how to do this correctly. High churn because at Capital One, we have a set of compliance software that needs to run on Kubernetes within the enterprise, and this software needs to be maintained, updated, and controlled. Scalability issues because frankly, Kubernetes was a little young at that point in time, but also nobody was working together on a centralized solution.
So what did we do? We moved to a centralized solution. We called it the federated model. Within Capital One, we decided that Kubernetes, and at the time Amazon EKS, should be managed by centralized tooling that a team should carefully curate for all of the platform teams within the enterprise. These platform teams would then be able to contribute more to the business value that they were actually creating with their platforms. In order to do that, we had some tenants, some core principles that we really wanted to enforce to make sure it could work at the scale that we needed. We had several platform teams that wanted to use Kubernetes.
Those tenants were to be able to meet our platform teams where they were. We wanted to do it with them and not for them. So we created a sense of shared ownership and empowerment across the enterprise with our platform teams that use Kubernetes to ensure that they were getting the product that not only they wanted, but they liked. But how did we do that? We looked upstream. We looked at Kubernetes to see how it was solving this problem within the community, and we used a concept that they've invented, the Special Interest Group, or SIG. When we had critical challenges within our enterprise, we would create a SIG that involved some subject matter, and that SIG would solve the problem.
For example, if we needed to ship metrics off to somewhere, you'd get the engineers that had the pain and the people that were passionate about the problems in the same room, in the same SIG, to solve these problems. All of their solutions would then come right back to the centralized tooling that we maintained and pushed out to the platform teams. This was great. We were operating at a higher scale, we were innovating, and we were doing Kubernetes in a mature manner. But we still had some pain.
Two areas primarily where we experienced pain were infrastructure management and container management. Infrastructure management, because when you get to the level of scale that we did, you need to make sure that you're patching operating systems, you're using the right instances, and you're not overprovisioning and spending hundreds of thousands of dollars on dead air CPU cycles. Container management because, like I said before, we had compliance software that needed to be pushed out to every single cluster. That software needed to be maintained and updated. We were spending hundreds and hundreds of hours maintaining all of this and making sure that it was doing the right thing in our enterprise, and it caused a lot of friction.
Enterprise Adoption: How Capital One Validated and Deployed EKS Auto Mode
And that's where we get to today. That's where we get to EKS Auto Mode. EKS Auto Mode, like Sai and Alex have been saying, manages a lot of that for us. AWS is in the business of taking away that management pain because frankly, I've used Kubernetes for a long time, and I don't want to manage it anymore. I've been doing it enough. Now I know what you're thinking before I say how EKS Auto Mode addresses the pain. I do want to acknowledge you don't take a massive financial institution, switch on Auto Mode on a Monday, and say great. That's terrifying. There are several things that you need to do in order to adopt, and I want to talk about that adoption really quickly. We decided that to dogfood EKS Auto Mode, we would use it with our central delivery platforms.
The centralized team that created the tooling. Well, let's throw EKS Auto at that. Once that was working, once the control plane was behaving like we expected with all the software and compliance containers that we were handing it, we didn't have promises, we had data. We had actual proof that EKS Auto was solving the problem that we believed it to. Once we did that, the conversation became a lot easier. And that data was addressing the two pain points that I talked about earlier.
Automated infrastructure management. Karpenter backs EKS Auto Mode. With Karpenter in EKS Auto Mode, we no longer have to worry about am I provisioning the right instances, am I using CPU that I don't need. We give it a deployment, we let it run, Karpenter does the rest. It's managed for us. Also, simplified container management. My team is no longer spending hours and hours focused on upgrading containers like EBS CSI driver or the load balancer controller that are table stakes to run a Kubernetes cluster at scale within a massive enterprise.
Right now we're looking at some other use cases for this. The two primary ones for us are machine learning. In a multi-tenant platform, these make great use cases for EKS Auto Mode because of the scale that they have, especially in a multi-tenant platform that has several different applications. A platform team won't have to worry about what they're throwing at the cluster again. They deploy it, it schedules the workload, and infrastructure is handled for us.
Now, just to close here, there's been a lot of key metrics that we have deemed valuable in our usage of EKS Auto Mode, but I think one of the best ones is one that's a little more intangible. It's silence. And what do I mean by that? Well, before EKS Auto Mode, a lot of our customers would come to us and they'd say, hey, why is my managed node group not updating? Why is EBS CSI driver stuck in a crash back loop? Why isn't it on the version that I need it to be on? And now with that, our Slack channels are silent. Nobody's asking any more questions around these problems that we were dealing with and spending massive amounts of hours troubleshooting with our platform team partners, and that silence is golden.
New Features and Expanded Availability: Regional Support and Advanced Configuration Options
So with that I'd like to close and hand it back over to Alex to talk a little bit about some of the great features that they've delivered over the last year with EKS Auto Mode. Thank you very much. Thanks, Dan. Appreciate it, Dan. Dan, stay on stage with us. It's all good. You know, so one of those features, for example, is this SOCI parallel pull and unpack feature. We just launched that this fall. It dynamically, drastically reduces container startup times and is just one of the many things that we've been working on. You know, just like those 250 plus features that EKS itself has been launching over the last eight years, EKS Auto Mode has been busy responding to customer feedback and building new capabilities that help delight and make customers' lives easier.
We've been making continuous improvements, and while all the details are available in our AWS user guide for EKS and there's a release notes page specifically, I want to highlight some of the most significant additions that we've made over the last year. And there's six key areas that we've been focusing on, each in response to specific feedback that we've been getting from our customers as they've looked at Auto Mode, thought about if they could use it, and what they might need to adopt.
And so first I want to talk about expanded availability, you know, how we brought Auto Mode to new regions including the US GovCloud and local zones, making it accessible for public sector workloads and edge computing use cases. Next, we'll explore the advanced configuration options that give you more control over networking and security while maintaining the operational simplicity that's core to EKS Auto Mode. Then we'll dive a little bit deeper into that SOCI parallel pull feature and explain how it automatically is optimized for GPU instances and why it makes it such a game changer for AI/ML workloads.
We'll also cover our enhanced security and compliance capabilities that make it easier for customers like Capital One to adopt Auto Mode across large enterprises and highly regulated industries. After that, we'll discuss a couple of capacity management features, and then finally we'll wrap up with thoughts about how all these work together to continue to evolve Kubernetes operations.
So first, one of the most requested features for Auto Mode has been broader regional availability, especially for government and regulated workloads. I'm excited to share that EKS Auto Mode is now available in all the commercial regions where EKS is available with the exception of the AWS China regions in Beijing and Ningxia. And in 2025 we expanded that to include the AWS GovCloud regions, enabling agencies and organizations with sensitive workloads to take advantage of Auto Mode's operational benefits while meeting their compliance requirements. It's a natural fit for Auto Mode's high operational and security bar.
Additionally, this year we've brought Auto Mode to AWS Local Zones, allowing you to place Kubernetes applications closer to your end users or on-premises systems, reducing latency while maintaining the operational simplicity that Auto Mode provides. This has been particularly valuable for customers running latency-sensitive applications like real-time analytics or games.
We've been significantly expanding the configurations that are available in Auto Mode without affecting its simplicity, and to be honest, this has been one of the largest areas that we've been investing in over the last year. We took a very deliberate approach when we launched Auto Mode to have it have a relatively simple set of configurations that were available, and we wanted customers to come to us and tell us what specific other knobs and buttons they may need for it to be a good fit for their use case. With that feedback, we've been busy answering and delivering those kinds of customizations that customers need.
First, one of the most highly requested features that we had for Auto Mode was the ability to use separate subnets and security groups for pods as opposed to nodes, which you can now do through pod subnet selector terms and pod security group selector terms. This may not be as familiar because when you're running a standard EKS cluster through the VPC CNI, you do this through a mode called custom networking. This approach achieves the same thing but does so in what I think is a little bit of a cleaner user experience that's very similar to other kinds of configurations that you may be familiar with in Auto Mode.
We've also added a bunch of other advanced networking configurations, things like the ability to disable or enable the ability to associate a public IP address with the instances that Auto Mode launches. You can also specify whether or not the traffic on nodes that Auto Mode launches should be routed through a proxy, so you can specify that certain destinations should be going through a forward proxy, whereas others shouldn't, like localhost destinations. This is essential to support environments that require this kind of proxy for outbound connectivity or compliance reasons.
For customers, we've heard a lot that they needed to provide various private certificate authority material to the nodes that Auto Mode runs to be able to authenticate with other systems in their environment. So you can now do this through the certificate bundles parameter. This lets you provide a set of public key material for Auto Mode instances to use as they communicate with other services in your network. This is particularly valuable for organizations with internal PKI systems or custom certificate authority requirements. These additions make it so that Auto Mode's operational simplicity is paramount while still giving you the configuration flexibility that you need to meet your specific organization's requirements.
Enhanced Capabilities: SOCI Parallel Pull, Security Features, and Capacity Management
And I just want to add one thing here, folks. When we launched Auto Mode, we started listening to how customers were using it, and so I just want to stress something you said, Alex. Talk to us. We will share with you our public roadmap on GitHub. I know Alex probably looks over the issues on our roadmap every day. It's probably in his morning routine. So we really are listening, and tell us how you want to use Auto Mode and he'll make it happen.
Yeah, and to be honest, that's exactly right. There's a number of other things that we've already heard, things that I think we'll hear in the future about if only Auto Mode would let me set X, Y, or Z, it'd be a great fit for my use case. That's exactly what we want to hear, and that kind of feedback is invaluable for me as a product manager to figure out what we should be doing next and how we can help enable our customers to offload all of this kind of operational burden that Auto Mode allows you to delegate to us. So please keep it coming.
We saw this live in Sai's demo, the SOCI parallel image pull and unpack, and this is a common pain with containerized workloads, especially those using AI/ML use cases where the time it takes to download that image and unpack it can delay really meaningful metrics for these kinds of scenarios like time to first token. These are increasingly critical as companies adopt more and more AI/ML and inference use cases.
We've addressed this, as Sai showed and mentioned, by implementing the SOCI parallel container pull and unpack capability, dramatically reducing the startup time by parallelizing these operations for different layers in the container. The best part, as Sai mentioned, is that this is also automatically enabled for every GPU-enabled instance in Auto Mode, as well as those that don't have GPUs but have local NVME storage. So these are things like G series instances from Sai's demo or the P series with other Nvidia hardware, Trainium instances, as well as a variety of others that just happen to have that really fast local NVME storage. There's zero configuration required for this. We've done a bunch of work to figure out what the right settings are because there's a fair amount of them for SOCI and applied them automatically for the given instance that happens to be launched at that moment.
This drastically reduces the container startup time, as you saw in Sai's demo, from 2 to 3 minutes down to 1 minute and 15 seconds or something like that. We've seen up to 60% and even sometimes higher percentage reduction in the total pull time. As your organizations start to think about how they can leverage AI, this will become an increasingly important factor in how to serve those workloads quickly and efficiently.
Dan, Capital One works in a very highly regulated industry. A large requirement for compliance and security for us is our top priority. We've added several important capabilities over the last year to make sure that Auto Mode can meet even the highest security standards. First and foremost, you can now encrypt both the root volume and the data volume of instances launched by Auto Mode using custom encryption keys that you provide. These are KMS keys that you provide to AWS, and that will then allow you to encrypt all of the data that resides on those instances.
For customers in regulated industries and as part of the launch into the US GovCloud regions, we've also enabled FIPS validated cryptographic modules for EKS Auto Mode's controllers and the instances that we launch. This means that you can activate these FIPS cryptographic modules using a setting in the node class for Auto Mode, so advanced security FIPS. This will ensure that your clusters comply with Federal Information Processing Standards. A key thing for a lot of customers in the public sector or that have customers of their own that are in the public sector and run in these GovCloud regions is to be able to attest that their infrastructure meets these requirements. This is one piece in how you can help enable that with just a single setting in a YAML.
One of my favorite new features addresses a common IAM challenge. Sai was alluding to how even seasoned operators will struggle with getting the right sets of IAM resources stood up, and often it's a really sensitive operation to delegate out to large swaths of an organization. It may not be something that every user in a given team should be able to do, that is creating new IAM roles or attaching various policies to them. You can now enable teams to use Auto Mode without having to grant them those additional permissions. They just need an instance profile. This simplifies the access management and maintains your security posture.
I think that this is a great example of a real world example of the principle of least privilege. You don't really need someone to be able to create IAM roles to be able to launch Auto Mode instances, just that instance profile. If you're in an organization where there is very careful attention paid to the permissions that different users or personas get, this will be a great feature that will make it that much easier to get started with Auto Mode. You won't have to go ask your security team for permissions they're going to want to think twice about giving you. Instead, the very simple instance profile is a great way to work around that.
These features are fundamentally a reflection of our commitment to security by default, which Auto Mode has had since its launch through the choice of Bottlerocket as its operating system and various other security enhancements that Auto Mode has. We hope that it helps give you the tools that you need to meet your own security and compliance requirements.
The last group of features that we have here are around the capacity itself, the instances in your cluster. While Auto Mode's dynamic capacity management powered by Karpenter is perfect for a lot of workloads, there are absolutely scenarios that require more predictable or reserved capacity. That's why we've added these two important features around how Auto Mode handles capacity management. First, Auto Mode can now prioritize reserved capacity, so on-demand capacity reservations or capacity blocks for ML, which will help increase the cost efficiency of your existing investments in these pre-purchased options.
You can configure this using this new stanza in the, you'll recognize a theme here around terms, new stanza in the node class capacity reservation selector terms. This lets you target very flexibly reservations by ID or via tag and will let Auto Mode say, hey, if I can fulfill the compute needs for a workload using reserved capacity, it'll always try to do that first rather than spinning up an on-demand or spot instance, for example.
I mentioned Auto Mode by default has this very dynamic and flexible compute management model powered by Karpenter, but not all workloads need that kind of dynamism. For that we've introduced a feature we're calling static capacity. This is a very different style feature. This makes capacity management in Auto Mode work very differently than it does by default in that you can tell Auto Mode that you want to have a set number of instances always running.
Regardless of how many pods are in the cluster or how many pods are pending, which is typically how new instances come into an Auto Mode cluster, this allows you to pre-provision capacity independently from the workloads in your cluster and will ensure that you always have the resources you need for mission critical applications or those that simply just don't need to auto scale. I think these features give you a lot more control over how Auto Mode manages capacity and maintains all of the operational benefits that you get from Auto Mode. Obviously, it helps you optimize for both performance costs and your team's time, which is really critical.
Key Takeaways: The Future of Kubernetes Operations with EKS Auto Mode
So as we wrap up, I want to leave you with a few key takeaways from our journey today through EKS Auto Mode and kind of what it means for how we think Kubernetes operations looks in the future. First, Auto Mode is a fundamental shift in how we think about operating Kubernetes. While 93% of organizations are using or evaluating Kubernetes, the operational complexity can be a significant challenge. EKS Auto Mode allows you to move from manual, complex cluster management, that pain on the maze slide in Dan's section, to a fully automated AWS managed cluster infrastructure.
Second, this isn't just about reducing operational overhead. It's about enabling innovation. As Dan told us, when your platform teams aren't constantly firefighting infrastructure issues, they can focus on what really matters, building features that drive value for your business and help you and the rest of your organization innovate. Third, Auto Mode maintains the power and flexibility of Kubernetes while drastically simplifying its complexity. You still get full access to the entire Kubernetes ecosystem. Auto Mode is Kubernetes conformant. Anything that you can do in Kubernetes, you can do in Auto Mode. But you simply don't have to bear the burden of that heavy operational lift because it's Kubernetes. All of those tools continue to work seamlessly.
Finally, enterprise requirements don't have to mean operational complexity. Auto Mode is ready to meet your security, compliance, and performance needs while keeping things simple. Whether you're just starting on your Kubernetes journey or looking to modernize your existing platform, give Auto Mode a shot and tell us how it works out for you, and I think you'll experience what we've been discussing here firsthand. Thank you all so much. Thank you, Sai, Dan, and enjoy the rest of your re:Invent. Thank you. Thanks.
; This article is entirely auto-generated using Amazon Bedrock.










































































Top comments (0)