Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Simplify your Kubernetes journey with Amazon EKS Capabilities (CNS378)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Simplify your Kubernetes journey with Amazon EKS Capabilities (CNS378)

In this video, Jesse Butler and Sriram Ranganathan from the Amazon EKS team introduce EKS Capabilities, a new feature layer that extends beyond cluster lifecycle management to help customers scale Kubernetes workloads. They explain how EKS Capabilities provides fully AWS-managed implementations of three open-source tools: Argo CD for GitOps-based continuous deployment with deep AWS integrations (ECR, CodeCommit, Code Connections, Secrets Manager), ACK (AWS Controllers for Kubernetes) for managing AWS resources through Kubernetes custom resources with sophisticated IAM role selectors enabling multi-tenant and cross-region deployments, and kro (Kube Resource Orchestrator) for creating custom platform abstractions that simplify developer experiences. The controllers run in AWS service accounts rather than customer clusters, eliminating operational overhead. The presenters demonstrate how these capabilities work together to implement modern platform engineering patterns, discuss architectural considerations including centralized versus decentralized management models, and emphasize the importance of properly designing IAM and RBAC permissions for secure, scalable deployments.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Amazon EKS Team Presents New Capabilities for Platform Services

Thank you for joining us. This is an amazing conference with so many people here. Thank you for taking the time out of your busy schedule to be with us. So my name is Jesse Butler. I am a Principal Product Manager in the Amazon EKS service team. I'm joined by my colleague Sriram. Hey everyone, I'm Sriram Ranganathan. I'm a Senior Product Manager with the Amazon EKS service team. We're really excited to talk with you about something we've been working on for several months called Amazon EKS Capabilities.

I'm going to start with some context, go through the theory of why we built this set of features and how we hope they will benefit you, and then Sriram will dive into some of the details. Starting with setting context, if anybody's seen any of the launch announcements, you know that this is related to platform services and growing your cluster experience through these foundational services for GitOps. I like this quote from Alan Kay, who is the father of the Smalltalk programming language and object-oriented programming: "Simple things should be simple, and complex things should be possible." This is a great analogy to abstractions and why we abstract things in systems.

A more relevant quote to our discussion is from Edsger W. Dijkstra: "The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise." This is a better software engineering concept around abstractions. We don't want to hide the complexity under abstractions because often there's power in them. If you look at AWS or Kubernetes, there's so much you can do with that cluster. The last thing we want to do is put a layer on top of it that obfuscates. We want to actually elevate.

Kubernetes itself is an abstraction or set of abstractions. Kubernetes makes complex things possible. We like to look at Kubernetes as the platform layer for many customers. It's the front door to AWS. It's often said that Kubernetes is complex itself. I personally have worked on fairly large software projects, scheduling thousands or hundreds of thousands of processes across hundreds or thousands of computers. That is possible, but it is hard. It takes a lot of concerted, deep technical effort and a lot of resources. Kubernetes democratizes scaled distributed computing.

This isn't my opinion. The world kind of agrees. Eighty percent of enterprises are using Kubernetes in production with another thirteen percent reporting piloting or investigating for future adoption. This has become a de facto standard in cloud computing because it is a good abstraction and a powerful way to layer functionality into your systems.

The Power of Kubernetes: Extensibility, Ecosystem, and Relative Simplicity

Simplicity is one of the things that we're looking for, and we'll talk a little bit about how EKS is one of our tenets of making simpler experiences for our customers. We also need to think about consistency at scale. This is where we have things like declarative configuration and how these cloud-native practices have come to be. System consistency is incredibly important at the scale that many of you operate at, and if you're just getting started in your scaling journey, the scaling level that you aspire to reach makes consistency paramount.

We also have extensibility as a core requirement for building platforms in the modern world, and Kubernetes is incredibly extensible. These abstractions are good and powerful, but you can also extend them and build out any of your own custom needs right into the cluster. One of those things around extensibility is the result of this huge, vast ecosystem of tools and projects that are available for you to use. Kubernetes itself is incredibly resilient and very powerful, but it is not itself an end-to-end production-ready application platform, as many of you probably already know.

You need to install things in the cluster to make it be what you need it to be to ship and run software at scale. The CNCF has over two hundred projects at this point, hundreds of compatible tools. This ecosystem is vast with really unlimited customization beyond that. You can build your own controllers and have your own custom resources in the cluster to do just about anything you want, from managing AWS cloud resources to ordering Domino's pizzas.

When we look at simplicity and all of that amazing abstraction and extensibility, we're really talking about relative simplicity. Kubernetes itself isn't a simple thing, but it is relatively simple.

When you compare it to other things. For example, the core system has about 1,500 methods, and AWS SDK has about 10,000. Now these are vastly different things with different purposes. AWS is a bunch of features and products and other things that do a lot of things, but there is relative simplicity if you compare the two. And what's nice with EKS and using Kubernetes as the front door to AWS is that you're able to capture some of that complexity and powerful primitives that you can abstract for your end users or for yourself as you're scaling your workloads.

Evolution of Amazon EKS: From Managed Control Plane to Auto Mode

Seven years ago when we started, Amazon EKS was launched right here at re:Invent in 2017. Our main motivation was to make Kubernetes more accessible to AWS customers. We knew that this abstraction and the set of primitives and these standards would evolve, and we were right. We're very happy to have been right because the growth of EKS has been enormous. EKS started as a managed control plane, right, so we knew that doing Kubernetes the hard way while educational was actually hard. So the managed control plane was where we started and we've evolved ever since with this mission to help you build reliable, stable, and secure applications with Kubernetes clusters.

It's very important to note that it's Kubernetes—it is upstream compliant Kubernetes—so we get to benefit from all of the extensibility as well as you do. So if we look at one of the feature layers for EKS around the data plane and node management, when we started with that managed control plane, we had self-managed nodes. This was fine, right? The real complex part was in the control plane and actually customers wanted self-managed nodes. They wanted that autonomy and over time that became burdensome to manage all of those nodes directly. And so we put a lifecycle API on it with managed node groups, allowing you to create groups of nodes that you could manage with API calls and manage that lifecycle.

Moving to last year's re:Invent, we announced EKS Auto Mode, which is a fully managed data plane. So that gives you an end-to-end push button cluster where you just don't have to worry about nodes at all. We fully manage everything as well as the storage and networking integrations. Right, so this is an example of the evolution of EKS starting with what we know customers need at the moment and evolving over time based on your feedback and also based on where the direction of Kubernetes is going as well as the direction of AWS.

When customers start with EKS we're talking about building and managing clusters primarily, and EKS makes it really easy to do that, right? At this point with Auto Mode, you go into the console, you click a button rather than going and growing coffee beans and roasting them and making pots of coffee, you can simply get a cup of coffee from the kitchen and come back and pretty much have a cluster up and ready to go. So our mission since launch has been to continuously improve this cluster lifecycle management for you, to make that a better experience over time, adding new feature layers like Auto Mode and hybrid nodes. These are all sort of commitments toward that vision.

Scaling Challenges: The Need for Platform Engineering Teams

But when you look at scaling your workloads and using more clusters and expanding your workloads across multiple accounts and multiple regions, you're really scaling with Kubernetes. EKS helps you build and manage production-ready clusters. It's up to you to manage and scale your Kubernetes workloads, networking components, storage resources, all the integrations that you have, plus any custom solutions that you need. All of those things come into the cluster and become things that you manage maybe with Helm charts or maybe with other methods.

This can work really well to get started and even at scale can work really well. There's a lot of people that are very good at automation, they're very good at governance, but as scale continues, you have more regions, more accounts, depending on how autonomous your teams need to be. Compliance gets hard, auditing gets hard, and scale just brings more operational burden. So if we look at how most customers are dealing with cluster management beyond the API, there's a lot of Terraform use, some CloudFormation, and we're seeing some Kubernetes IAC.

So what we have here is sort of the imperative external to the cluster IAC, mostly Terraform, and this works really well. We have version control declarative configuration that's reconciled against the source of truth. Terraform does its thing and you get a cluster, and this can work as you scale as well, right? Having two or three teams going across a couple of regions as you start to scale, this can work incredibly well.

But at some point we do hit a point of friction. We experience this problem where there are just too many cooks in the kitchen or too many pipelines for one small team to manage. If something goes wrong, we end up in this position of trying to find where it went wrong, how to remedy it, and who even owns it. This can be a problem with scale.

So we start with single clusters and want to get to this all-seeing, all-dancing cross-region, cross-account, scalable thing. How do customers who have been most successful with EKS get there? What we see is that there is an inflection point around the point where multiple clusters come into the mix. You want to get to that next level of abstraction and automation. This is the point where we see a lot of customers start investing pretty heavily in dedicated platform engineering resources. We move away from decentralized teams and more toward a structured platform engineering path to get to the next level.

This is something that we see commonly with all of our most scaled customers, but all of them do it a little differently. This is bespoke investment. There is engineering work to be done here that is not necessarily the business logic that you want to ship to customers. At our largest customers, there are dedicated platform teams whose customers are the internal development teams. This is a place where we see a lot of investment and a lot of commonality, but things are slightly different.

Cloud Native Characteristics and Common Platform Foundational Components

One thing that is true is that these platform engineering efforts follow these core cloud native characteristics that we see in Kubernetes also repeated throughout the ecosystem. Declarative configuration and continuous reconciliation are the bread and butter primitives of these systems. Programmatic discoverability and observability are really important at scale. If you want to run 400 clusters with a team of five people, you need to be able to automate and you need to be able to watch where you are.

Active drift detection and automated self-healing is really a sort of holy grail. Wouldn't it be great if the system using its underlying primitives could keep itself whole? What if an imperative change comes in? What if something fails? Can something automatically fix that at two a.m. for me and slack me so I can see it with my morning coffee rather than getting paged at two in the morning? On top of all of this, we want these things to be standards-based. Linux took off not because it was the best operating system on the planet, but because it was an open standard that we could all contribute to and make better. That is what we see with Kubernetes. It is an open standard. It is portable and it is governed.

When we look at these characteristics and think about Kubernetes, we check all of the boxes. This is how the system was built. I think it is important to note that as we start looking at something like EKS capabilities and wondering whether we need this in our world, we often find that people who are new to Kubernetes and new to DevOps are learning DevOps through Kubernetes. They start using Kubernetes and they find that these things are the aspects of the system where in fact these are just DevOps best practices that through trial of fire and ice have become the principles and the standards that we know.

Kubernetes was designed with these principles in mind. So we see that Kubernetes is in fact a reference implementation of these underlying characteristics that are ideal for systems. The reality is that there is no ideal system for everyone. Every customer has different needs and different requirements. You may have different governance and compliance for restricted, regulated workloads and restricted environments, and just generally culture, which is different across every customer. Everybody has different needs, so we cannot say to take the capabilities off the shelf, buy that product, and just use it. That is not how that works. We have to think about all of the different ways that we have to keep the standards open and keep the primitives light so that you can build what you need to in order to scale.

One thing that we see across all of our most successful scaled customers in EKS are these common platform foundational components. We are looking at workloads, infrastructure for cloud resources, and infrastructure for the underlying clusters and related production databases, that kind of stuff. These are the foundational components that every team has to solve for. Depending on the line of business, the culture, or even the point of scale, all the other things may be different, but these are three things that everyone has to account for at some point as we begin the scaling journey.

GitOps as a Reference Implementation and the Introduction of EKS Capabilities

We looked to GitOps, and I think this was a very trendy and buzzy word for a while. We have been consistently guiding customers toward GitOps since 2019 or 2020, so since before it was a buzzword, through it being a buzzword, and now it's no longer a buzzword. I think of GitOps as a reference implementation. It takes these best practices, these cloud native system fundamentals, and how we think of a system running well in the cloud today, and it turns it into a reference implementation.

These four characteristics are what makes a GitOps system. You can build it with cron shell scripts on a Raspberry Pi if you want to. It doesn't have to be Kubernetes, but it does have to have its desired state expressed declaratively. Declarative configuration is a requirement for practicing GitOps. That desired state is immutable in version, which just means if it's a YAML file, it goes into Git. It's versioned, then it's immutable. The desired state is automatically applied from source. So when you push something to Git, it automatically ends up in your running system.

Now, these three first things are anybody using Git and doing ops. That's GitOps. The last one is most paramount and the thing that makes such a good fit for Kubernetes. The desired state is continuously reconciled. So that means imperative changes that come in from a junior admin who hits the wrong button, all the way through to nodes falling over and having to repair themselves through a software update. All of these things come through from the system out, not imperatively from you getting paged at 2 in the morning. So the desired state is reconciled by agents within the system. This is probably the most defining characteristic of what a GitOps system is. That's why Kubernetes is such a good match.

So GitOps, using this reference implementation, we can work toward building ideal platforms. We can't build ideal systems for everybody because everybody has different requirements, but the platform can be ideal. One of the things we were motivated to do earlier this year is to see if we could find some commonality and some things that we could help provide in EKS as native features to help with this practice of GitOps for workloads, for cloud resources, and for clusters themselves. Building a fleet management system has never been possible for us because every other customer we talk to has wildly different requirements. So with this reference implementation being extensible, you can actually build your own, and we can help support you with that.

So GitOps works really well, and it is open source software that's usually happening on the Kubernetes cluster. For example, you might use Argo CD, ACK, or Crossplane for cloud resources, or any number of other solutions. Argo CD is kind of a de facto standard. So looking at the system, you'd say yes, okay, now we have manageable growth. You can add more clusters and you can manage with a single control plane those deployments. You can add more clusters, and eventually we end up in this situation where we're scaling out decentralized GitOps environments which are based on open source that you're self-managing in your clusters. We're kind of back in the same position. We see teams ending up spending time here, and this is where the shift to needing dedicated platform teams comes in. We end up managing the thing that helps us abstract things away, but we end up having to manage the problems with that abstraction. Then you end up just chasing your own tail, and then you end up with issues in a system like this as well.

So Kubernetes-based platforms—this is the way that you scale. But we think that we have a way to help you do it a little more efficiently with that foundational bespoke complexity being something that we handle. If you look again back at our feature evolution, this time thinking about things in your cluster, we start with Helm charts. We point you to the CNCF and say have a nice time. There are 4,000 Helm charts. EKS add-ons was a step in the right direction, just like managed node groups. It puts a lifecycle API around some of the most commonly used operational add-ons. So what comes next? That's why we're really excited to introduce you to EKS Capabilities. This is the first time that EKS has had a new feature layer. We expect a full part of the roadmap dedicated here. We're really excited about this, that we're moving beyond managing cluster lifecycle to also help you build and scale with Kubernetes with your clusters.

EKS Capabilities Vision: Open Standards, Fully Managed, and New Innovations

With EKS capabilities, you can click a button to start using these features. They will evolve as you use them, and we'll be adding more features and refinements over time. You'll be able to increase your velocity from day one and remove friction in your workflow. If you're just starting out on your scaling journey, this is a great place to start, even if it feels overwhelming and you think you won't need it initially.

If you start using GitOps—either open source, self-managed, or with capabilities that you can enable with a few clicks of a button—you'll be ready for the future. Starting here means you won't ever have to do that refactoring and figure out how to manage everything while you're doing it. You just start with these primitives and you're ready to scale. It's all open and based on Kubernetes open standards.

Right now, the three capabilities we've launched are all based on well-known open source projects that are either de facto standards, emerging innovations, or things that AWS stands behind with full support. These are all things that customers already use and are interested in using more of. Argo CD is where our GitOps engine comes in. This is used by most of our customers practicing GitOps. Argo CD has really emerged as its own de facto standard alongside Kubernetes.

What we see a lot is Argo CD being self-managed and predominantly driven out of the platform team. Many customers actually use Argo CD particularly for their platform, where their development teams aren't using it to deploy workloads at all. Using imperative pipelines with Bash scripts and Jenkins has worked fine, and we see this starting to evolve out of the platform team. We also have ACK and kro, which give us the infrastructure part. Another piece we see with customers who are practicing GitOps, and some for years, is that they really enjoy it and have a lot of success with it, but they stop short of managing infrastructure with it. They only work on, for example, cluster add-ons or maybe just internal workloads.

What we think the real power in the system is when you can bring cloud infrastructure to the party and start managing your clusters, your workload resources, and everything together. ACK and kro help you do that. Before we dig into the details of how these work, I want to be clear about what a diagram might look like for a self-managed platform with the software components. EKS runs the cluster for you. In our account, we run the entire control plane. If you're using Auto Mode, we'll also manage your nodes for you.

In your account, you install Helm charts for Argo CD, ACK, and kro, and you have the CRDs to work with. You're creating applications and resources and instances, and therefore having S3 buckets and RDS instances in your accounts. With EKS capabilities, we're running all of the controllers and their dependencies in our service accounts, just like we do with the cluster control planes. Argo CD, ACK, and kro are fully run in EKS infrastructure. They don't take up compute resources and pod slots in your clusters. They free up your compute and your pod slots for your workloads, and we manage the lifecycle, patching, scaling, and the resilience and availability of your capabilities for you.

We do install the CRDs in your cluster, so you're still in full control of your applications, your ACK resources, and other custom resources that come through the capabilities. Our vision for this new feature layer is shown through the three initial capabilities we're offering. They're all open standards and Kubernetes native experiences, which we think is very vital. You're choosing to use Kubernetes whether you're starting out and ready to scale or you've been with us for years. You're choosing it in part because it is a standard.

You have a standardization layer, you have multiple environments, multiple teams potentially deploying workloads to multiple cloud environments or on-premises. You want that standardization. It's very important to us as we build more features and into the future that those are portable for you as well. We want to reduce friction, so these will be always fully managed for you. There will never be a capability that runs or deploys things to your cluster.

However, we might have additional CRDs in the cluster to actually help accelerate velocity and help you fine-tune your configurations. We'll talk about IAM role selectors with ACK in a bit, but there might be others of those in the future. We're also really excited about new innovations.

Bringing Kro into EKS capabilities is really the first time in EKS where we've offered software that is not generally available. This is software that is evolving, and it's something that the community is really excited about. It was recently donated to the Cloud Native Computing Foundation, so it's very visible in the ecosystem, but it is not GA. I will clarify that EKS capabilities are GA. Our ability to manage and run that software on your behalf is fully generally available in all commercial regions where EKS is available. However, what you might find is that we will lean into new innovations and new trends in the Kubernetes ecosystem, and this is a way for us to be able to do that for you.

Understanding EKS Capabilities: Creation Patterns and IAM Role Configuration

I'm going to pass it to my colleague Shriram. He's going to dig into the features and talk a little bit about the details. Thanks, Jesse. Hey everyone. So what is EKS Capabilities? EKS Capabilities is an extensible set of platform features that extend your EKS clusters. Unlike self-managed installations, AWS fully manages all EKS Capabilities for you. What this means is you don't have any operational overhead. You don't have to worry about installation, configuration, patching, or upgrades. All of that is automatically taken care of by AWS.

If you try to kubectl into your Kubernetes cluster and try to find Argo CD controllers or, for example, ACK controllers, you're not going to find them. They run in AWS's own service accounts, they're fully managed, and that's why you won't be able to find them. How do you go about creating a capability? All capabilities follow a common pattern when it comes to creation. When you create a capability, you first have to identify which capability you want. You have to give it a name so that you can refer to it later. You want to decide which cluster you want to enable the capability on, and then you have to pass a capability role. Depending on the capability, there might be some additional configurations needed.

We will touch upon those as we go through the next slides. All of these capabilities can be created using multiple modes. You can do it through the CLI, you can do it through EKS console, any of your favorite infrastructure-as-code tools, or through the EKS console. There are no restrictions. So when we build EKS Capabilities, we have introduced a new service principal that you see on screen here. When you create the capability role, you need to ensure that the trust policy trusts this particular service principal. This is key. It's not your OIDC provider. It is the EKS Capability service principal. You can always scope this down to specific resources by adding additional conditions.

Once you're done with the trust policy, now comes the time for permissions. Depending on how you want to use the capabilities and what integrations you want, you might need to add certain permissions for each of the capability roles. In the case of Argo CD, depending on whether your source is coming from CodeCommit, ECR, or CodeConnections, you have to give it appropriate permissions. With ACK, depending on the AWS resources you plan to use, you have to correspondingly give the underlying permissions. As an example, if you plan to use S3, then you need to give S3 permissions. If you plan to use RDS, you need to give it RDS permissions.

With Kro, there are no specific permissions required. It is confined to the Kubernetes cluster. But let's say, for example, if you plan to use ACK along with Kro, then you need to make sure that whatever AWS resources are governed by the Kro RBAC, you have to give the corresponding permissions to the ACK's capability role, not to the Kro capability role. We will walk through hands-on examples of three EKS Capabilities: Argo CD for automation with deep AWS integration, ACK for infrastructure provisioning with scoped flexible IAM roles, and Kro for creating platform abstractions. These are the building blocks. You decide how you want to use them.

Argo CD Capability: GitOps-Based Continuous Deployment with Deep AWS Integration

What is Argo CD? I know Jesse touched briefly upon what Argo CD is. Argo CD is a GitOps-based continuous deployment tool. Your Git repository becomes the source of truth, and Argo CD ensures that your cluster state matches what you have defined in Git. It supports drift detection and automatic reconciliation. What I mean by that is if you go and make an imperative change, Argo CD can automatically detect it and make sure that it brings the system state back to the desired state so that there is no drift. You have options to turn that off, but it is typically not the best practice.

One of the good things about Argo CD is it lets you connect a central Argo CD instance to multiple different clusters. So if you have applications that need to get deployed to different clusters, you can always do that from a central cluster. In this illustration, we have the EKS capability, which is an Argo CD capability enabled on a central cluster, and you can connect multiple different EKS clusters and push your deployments to it.

The good thing is these EKS clusters can be in the same account, different account, same region, different region. It does not matter. You can connect to any EKS cluster within your portfolio. Typically when you're connecting to multiple different EKS clusters and different accounts, different VPCs, different regions, you need to worry about the networking aspect of it. How do you do VPC peering? How do you set up transit gateways? The cool thing about EKS capabilities is you don't have to worry about any of it. That is fully managed for you. You don't have to worry about how to reach the clusters. We take care of that for you. You just need to give us the cluster ARN and we establish the connection.

Here is a specific example of how you go about creating the actual Argo CD capability. Here we have the type of capability selected as Argo CD. We give it a name. We identify the cluster to which Argo CD should be enabled on. There is a role, which is nothing but the Argo CD capability role. And in this particular case, we also have an additional configuration. AWS EKS Argo CD capability is integrated with AWS Identity Center. This is how we enable single sign-on for your Argo UI and Argo CLI. So you need to ensure that your identity center is enabled and you pass on the identity center configuration when you're creating the capability.

This illustration is showing how you can create the same Argo CD capability if you're more comfortable with console. You can do it through the console. It's pretty simple. Most of the information is pre-filled for you. You just select the role, select the identity center provider, and you click on next, and the capability will get created for you. In case you don't have a role already available, there is also a nice option available wherein you can create the role directly through the console itself. It is mainly for getting started experience. If you want, you can always scope it down when you're going through the role creation process. But if you are okay with the getting started experience, you don't have to change anything. Everything is pre-filled for you. You just need to agree to it and it creates the role and automatically selects it for you.

Once you create the role, this is where you will land. It gives you basic details about what this capability is, when was it created, what is the ARN corresponding to it, and if there are any health issues or not. At the top right corner, you see the capability issues. If, for example, for whatever reason you ran into an error or degraded, you can click on it and it will give you details in terms of what was the error and how to recover from that.

Argo CD is fully managed, but you still have access to your favorite Argo UI. You also have access to Argo CLI. You can configure that from Argo UI perspective, you don't have to do anything. We provide you the hyperlink from the EKS console. You just click on it, you enter your credentials, and you're logged into the Argo UI. So there are no restrictions compared to what you would do typically with a self-managed installation. Now that we have created the capability, let's dive deeper into some of the Argo CD core resources.

One of the main resources from an Argo CD perspective is the application resource. The application resource typically tells where my source code needs to come from and which cluster is it destined to. In this case, we have the source coming from my repository and it is going to the demo cluster. But what exactly is demo cluster? How does Argo CD know what this random string called demo cluster really means? Let's put a pin on it. It will become evidently clear as we go through the next slides. There's also an attribute called project. Again, I want to put a pin on it. In subsequent slides, we will cover how project plays a role in this entire setup.

Anytime you want to use a cluster as a valid target for your Argo CD, you need to first register it with Argo CD. And how do you register your cluster with Argo CD? You create a Kubernetes secret and pass the special label that you see on the screen here. You give the cluster a name, the demo cluster which we previously referenced. So this is where the mapping between the actual cluster to the ARN happens. In a typical self-managed installation, instead of the ARN, you will be providing the Kubernetes API server URL with EKS managed capabilities.

You don't have to worry about the URL. You can just specify the ARN and it will work.

Similar to clusters, we also need to register all of our source repositories. How do you go about registering source repositories? The same way as clusters. You create another secret in the coordinator of type repository, and we would be able to connect to any of your private GitHub, GitLab, or Bitbucket repositories. The good thing about this is that EKS capabilities comes integrated with AWS Secrets Manager. So if you want to store all of your Git secrets in Secrets Manager, you can store that and just reference the Secrets Manager. What this means is we do not pull any of the Secrets Manager credentials into the cluster and store it. It's just read at runtime, authenticated, and it remains in Secrets Manager.

Typically in a production environment, you do not want everyone deploying any code to any cluster. There needs to be certain security guardrails. App Project is the way Argo CD enforces these constraints. With App Project, you can think about what are my valid source repositories where the code can come from, what are my valid destination clusters where my source code can reach, and who can really deploy those. That is defined by the Argo RBAC that is highlighted here. So Argo CD App Project acts as the container which gives you the security boundaries.

Next, let's take a look at some of the EKS-specific integrations that come with the Argo CD capability. In this particular case, we are pulling a Helm chart from ECR. If you see the repo URL, it's not a GitHub URL. Instead, it is an OCI URL. What this really means is with respect to any of the EKS-specific integrations like ECR, you do not really have to create a repository secret or do credential management. The capability IAM role can automatically authenticate into your ECR repository and pull your Helm charts into your cluster. With self-managed installations, that is not going to happen. You have to install the repository secret and worry about credential management.

The same thing goes with AWS CodeCommit. You again specify just the URL of CodeCommit. The IAM role can automatically authenticate into it. You don't have to worry about credential management or creating any of the repository secrets. There's one more integration that we have, it's called AWS Code Connections.

Code Connections lets you connect to your GitHub, GitLab, or Bitbucket without managing personal access tokens or secrets. The way it works is through an OAuth handshake. You need to first register it once with your Git provider, and once the authentication is established, you can directly reference it from your Argo CD applications. Again, the authentication is automatically handled for you. You don't have to create the credentials. You don't have to worry about managing the secrets.

So what are the key takeaways? The key differentiators are the direct integration with multiple AWS services like Secrets Manager, ECR, CodeCommit, and Code Connections. This simplifies how you connect to different source repositories to the Argo CD instance. This is unique to EKS capability for Argo CD. Now let's take a look at what is ACK.

ACK Capability: Managing AWS Resources with Sophisticated IAM Role Selectors

ACK, or AWS Controllers for Kubernetes, lets you manage AWS resources using Kubernetes custom resources alongside your applications. Define an S3 bucket or an RDS database in a YAML file and apply it to your Kubernetes cluster, and ACK handles the rest for you of provisioning those resources. ACK is built on GitOps principles with continuous drift detection and reconciliation, so ACK can always make sure that whatever you have defined within your Kubernetes cluster is the source of truth, and when something deviates, it can bring it back to the desired state.

How do we go about creating an ACK capability? Like we previously said, all the capabilities follow the same creation pattern. You tell what type of capability you want, which cluster you want it on, what is the name of the capability, and what is the IAM role that the capability should assume. This is just showing it through the console as well. Here is an example of an ACK resource. In this case, we are creating an S3 bucket. If you look at it, it just looks like any other Kubernetes resource, but the kind is bucket.

Bucket is not a native API that is supported by Kubernetes. It is a custom resource, and this is enabled by the ACK controllers that are enabled by the ACK capability. So once you apply this particular Kubernetes YAML to your cluster, it is basically going to create an S3 bucket with the name that you have specified and the policies that you have defined for versioning and blocking the public access.

IAM Role Selector is a really important concept for sophisticated ACK deployments. It lets you map multiple different IAM roles to different namespaces. It also enables cross-region resource management, team isolation, and least privileged access. The IAM role you specify here is used by matching namespaces. In this case, the ACK target account role is used by namespaces which is production and anything starting with prod hyphen. If you do not want namespace specific roles, it's fine. Just get rid of the namespace selector and it will apply cluster-wide. It's completely flexible, it's up to you how you want to manage it.

Let's say you want to do cross-region deployments using ACK. It is as simple as just specifying the annotation for the region, and the resources will be automatically provisioned in the corresponding regions. Let's say you have one Kubernetes cluster. There are multiple teams working on it. They all have different resources, different requirements for spinning up different resources within AWS, and they all need different permissions. How do you go about doing it? You basically can define multiple different IAM role selectors, have multiple roles, and assign them to different namespaces. Kubernetes RBAC controls who has access to those different namespaces, and IAM permissions controls what resources they can spin up. So this way you can also operate a multi-tenant ACK environment from a single cluster.

Key takeaways from an Argo CD perspective. So we have seen two capabilities so far. One is Argo CD and the other one is ACK. Argo CD handles the application deployment with direct AWS integrations. ACK handles infrastructure provisioning with sophisticated IAM patterns. This covers the deployment and infrastructure, but developers still need to worry about the underlying resources. How do we abstract that for our developers? And that's where Kro comes in.

Kro Capability: Creating Platform Abstractions Through Custom Resource Orchestration

Kro, also known as Kube Resource Orchestrator, lets platform teams create custom coordinator APIs by composing existing resources. Think about it this way. If you have a web application, a web application could be composed of a deployment, a service, an Ingress. It can have an S3 bucket and an RDS database. Today, developers need to know about all of that. Would it not be convenient if you can combine all of them into one entity and call it a web application so platform teams can create a new custom resource using Kro on the fly called a web application, and they just allow certain things to be configurable by the developers. Developers do not need to know the underlying complexity. They can just create instances of the web application by passing some basic parameters using simple YAML, and they don't have to worry about any of the underlying complexity.

Creating a Kro capability is similar to what we saw with the ACK capability. I'm not going to dive into the details. The only change is the type of the capability is going to be Kro in this case instead of Argo CD or ACK. The same thing can be done through console.

And now we come to resource graph definition, which is the core to how Kube Resource Orchestrator works. The resource graph definition is where platform teams define custom APIs. You encode your best practices here. The schema section defines what fields developers can configure. In this case, they can specify the app name, the number of replicas of the web app, and whether they need an RDS database or not. The resources section defines what gets created. The RGD creates a web application resource type in this particular case.

This is what developers really need to create. The platform engineers have created the RGD. When developers need to create a web application, it's as simple as this. They come, they create a custom resource of the web application. They specify the name. They tell how many replicas they need and whether they need the database or not. And you can make this as simple and as sophisticated as you want.

Crow also empowers you with the power of prescriptive patterns. Developers only need to specify the name and region when creating an S3 bucket. In this particular case, an S3 bucket is embedded within a Crow RGD definition with all the best practices built in. The platform team has already encoded that whenever somebody in the organization creates an S3 bucket, it needs to come with versioning enabled, lifecycle policies for cost optimization, public access blocked, and encryption enabled. These settings are immutable from a developer perspective. They create the S3 bucket, but the best practices are already encoded when the platform administration team has distributed the different RGDs.

Let's say you need to make an update as part of platform engineering to your existing RGD and you want to change the lifecycle policy retention from 90 days to 30 days and add intelligent tiering. You make the change to the RGD and deploy it to your clusters. All the underlying resources automatically get the updates. It is completely opaque from a developer's standpoint as they do not even know that the change has happened. Versioning resources by definition means that platform teams can evolve their abstractions over time, which is very critical for large organizations with many teams and backward compatibility.

Standards are enforced through code and not documentation. One of the cool things about Crow is that it comes with something called cell expressions. Cell expressions is how you embed logic into your resource graph definition. As you see here, in this resource graph definition for your production environment, it comes with larger DB instance sizes with multi-AZ enabled and also the backup retention is for 30 days as against 7 days for non-production environments. You can put this kind of logic within your RGDs depending on the type of instances they're created, and it will make sure that these policies are applied for you.

Before Crow, with independent custom resources, customers would need to work out the order of these resources, inject configuration across them, and create custom patterns for teams to consume. Now let's see how that changes with Crow. With Crow, you can easily create composable abstractions and group different resources together. If there are certain resources that need to get created before other resources, you can control the order. For example, the VPC needs to be created before the cluster gets created, so you can control that. Crow waits for the VPC to be created, grabs the VPC ID, passes it on to the EKS cluster creation before the EKS cluster gets created.

All these three capabilities work together to create a complete platform engineering solution. Crow provides the abstraction layer, ACK handles the AWS resource provisioning, Argo CD handles the GitOps deployment, and IAM role selector ensures proper permissions. This is modern platform engineering in AWS. What are the key takeaways? These features demonstrate why EKS capabilities is more than just managed installations. They provide deep AWS integrations and sophisticated IAM patterns that aren't available with self-managed solutions. This is infrastructure management designed for AWS.

Key Considerations: Operational Models, Multi-Cluster Designs, and IAM-RBAC Integration

With that, I will pass it on to Jessie to cover the next slides. Thanks for your attention. Hey, thanks everybody for following along with all of the how to get started and watching all of the details. I wanted to call out that we have documentation, blogs, and other resources you don't have to actually remember at all. There's a lot to cover and we're just touching the surface here. We'll have more content as we move forward, and we'll have workshops and things like that, so I just wanted to call that out. We're giving a very high level treatment as part of that.

Before you leave the room today, I wanted to give you some considerations when you're thinking about capabilities, primarily which ones you'd want to choose for which types of workloads or what type of requirements you have, what your governance and operational model is or want to be, what multi-cluster system designs might look like and how that impacts multi-tenancy considerations, and also permissions. Before you start clicking buttons and onboarding teams, you should plan ahead.

Consider how your designs might look and how that impacts the principle of least privilege. Think about how you want to manage those permissions and design them into the future so you can scale effectively. There are also considerations around using self-managed open source components versus using capabilities. These are all things you should be thinking about as you start exploring this set of features.

EKS Services are designed to interoperate, but they are not required to do so. This is the first thing to think about. If you only want to use Argo CD, you can. If you are only interested in kro, that is fine. These services are designed to work together, but they are all independent. We also support any compute type on EKS, whether it is hybrid nodes, EKS Auto Mode, or self-managed nodes, or anything in between. There is some really powerful functionality where you can click a button in the console, make a few more selections, and have an EKS Auto Mode cluster with all three capabilities. However, you do not need to use EKS Auto Mode with it. These primitive features are meant for you to design and build the system that you need.

Operational model is probably the biggest thing you want to look at as you start automating a lot more of your infrastructure and your workloads. Centralized management is the classic platform engineering model where you have a centralized management cluster or fleets of management clusters. You use these to orchestrate workloads and multiple cloud resources across the workload clusters. You onboard teams probably into namespaces or other types of tenancy models, and you provision AWS resources and clusters through that management cluster or management clusters. This can really simplify operations for teams, but it requires a bit more of a burden on a dedicated platform engineering investment.

Decentralized management might help you achieve a little more velocity up front, but there are trade-offs to consider around each cluster and each team having some amount of autonomy. Can you keep compliance in check? Can you do your audits correctly? That is where GitOps comes in really handy. The extensibility of that system is literally anything because it is all in Git. It is all verifiable and it is all immutable artifacts. So you kind of get the best of both worlds here where you can start fast but you can keep it on the rails. The thing to think primarily about is how many teams do you have, will you have, and how much autonomy might they need in the future.

Multi-cluster system designs are the bread and butter of modern platform engineering with Kubernetes having that management control plane. What does that look like? There are hub and spoke and there is local cluster. These are really the two options. Most very scaled customers with very mature large platform engineering teams prefer hub and spoke because it centralizes operations and costs around managing these components. Whereas with local cluster, we see that a lot more with scaling platform engineering efforts or when a large organization has multiple efforts that are all working independently from each other, potentially different business units. There are trade-offs here to consider, particularly around scale and how much you expect a team to be able to take on operational burden. Until it starts becoming a situation where your small decentralized teams start building their own dedicated platform engineering efforts, which probably is not ideal. So those are the trade-offs you look at for scale now but also into the future.

It is really important to respect both IAM and RBAC in the cluster and how they work together, especially bringing Argo CD to the party, which has additional RBAC controls. I think you should think about this on a continuum of defense in depth but also in specification. Think about IAM as the capability role and other roles that you would use potentially with IAM role selector as granting service permissions to the capability. This is what the capability can do or can be configured to do versus the controls in the cluster about who can do what where. RBAC is going to control the Kubernetes resources and where they can be created. These are intimately connected, particularly if an Argo CD application defines a kro specification which creates ACK resources. Now you are talking about anyone with access to be able to create that resource in your cluster having access to create those AWS resources in your accounts. This is transparent and it is very obvious probably to us sitting in the room, but it is something to really get fluent with when you start designing systems at scale. It is very powerful, just like Terraform, Ansible, and Salt.

When you think about infrastructure as code, you're really talking about taking the keys to the kingdom and putting them at somebody's keyboard. So you have to think about that with Kubernetes RBAC. You have that whole next level of defense, saying the system can do these things in AWS, but only these personas and principals in the cluster can do them.

Conclusion: Building and Shipping Faster with EKS Capabilities

The key takeaways on all of these observations, and there are many more, we're actually going to be building some of these considerations into extensive narratives and guidance for our best practices guide. The things to take away right now as you start looking at this and thinking about how you might be able to use it either as a scaling customer or somebody with a mature platform effort already is how you want to simplify operations.

Is it simpler for you to have a push button get ops for teams to get started? Do prescriptive resources actually make sense to put teams on the rails for self-service? Or do you actually need to potentially consolidate into a platform team for regulated workload requirements or just organizational standards? If you already have a mature platform team, are you planning for scale? Do you know where you're headed and can capabilities help offload some of the foundational pieces?

Likely if you're a mature platform team, you already have an enormous amount of differentiated value built into your platform that we would never presume to replace, but these foundational components can maybe free you up for building more of that. Ultimately you want to build and ship faster and safer. So with EKS capabilities, this is our next step and our next set of evolution of features to help you do that.

We're continuing to execute, which is one of the reasons I love being in the service still after six years. We're continuing to execute on this vision to help you focus on building and shipping software and not on managing the small pieces of a cluster that all of your colleagues at the conference also have to do. Anything that we find as a commonality, we're going to try and do on your behalf so you can offload that undifferentiated heavy lifting to us and you can focus on your differentiated value.

So before we leave, I wanted to call out some additional sessions at the conference and some resources. There are some sessions here that are worth taking a look at: a builder session and two workshops. These are going to be foundational GitOps and platform engineering content for you. I believe these workshops, some of them are self-driven as well. We have a ton of workshop content and learning materials through our websites and also session resources.

We always have our documentation, EKS Workshop, and EKS Blueprints. These are all of the things that come along with the service and we support all of these directly. So if you ever have ideas for new content or you find yourself looking to learn something that we're not quite covering, let us know and we're happy to help you with that.

Thank you so much for taking your time. I hope the rest of the conference is great for you. If you see us in the hall, please flag us down and we're happy to talk. Thanks a lot.

; This article is entirely auto-generated using Amazon Bedrock.