Asma Elalfy for AWS Community Builders

Posted on Feb 26

Stop Managing Kubernetes Infrastructure Manually — Use EKS Capabilities Instead

#aws #k8s #eks #devops

If you've ever spent hours wiring Helm charts, debugging IRSA roles, or babysitting controller upgrades in your Kubernetes cluster, this article is for you.
I recently built a developer platform on Amazon EKS where a single YAML manifest creates a complete application stack — Kubernetes Deployment, Service, and an AWS SQS Queue — all managed through kubectl. No Terraform for the queue. No Helm chart for the controller. No controller pods eating cluster resources.
The secret? EKS Capabilities — a GA feature (November 2025) that runs ACK and KRO as fully managed services on AWS infrastructure, outside your cluster.

Here's exactly how I did it, including the RBAC gotcha that took me a while to figure out.

What Are EKS Capabilities?

Traditional approach: you install ACK controllers and KRO into your cluster using Helm. You manage versions, node resources, IRSA roles, and upgrades yourself.

EKS Capabilities approach: AWS runs the controllers in their own accounts. You enable them with a single API call. AWS handles scaling, patching, and upgrading. You pay per capability (hourly base + usage).

Think of it like the difference between self-managing a database on EC2 versus using RDS. Same technology, zero ops.

I used three capabilities in this project:

Capability	What It Does
ACK (AWS Controllers for Kubernetes)	Manages AWS resources (DynamoDB, SQS) through Kubernetes CRDs
KRO (Kube Resource Orchestrator)	Defines reusable resource bundles as custom Kubernetes APIs
RBAC (manual)	Grants KRO the permissions it needs to manage child resources

The Architecture

Here's what the final system looks like:

Developer applies one WebApp manifest
            |
            v
    KRO (managed by AWS)
    Decomposes "WebApp" into:
    |-- Deployment (2 nginx pods)
    |-- Service (ClusterIP:80)
    |-- SQS Queue (via ACK)
            |
            v
    ACK (managed by AWS)
    Creates real AWS resources:
    |-- SQS queue in us-east-1
    |-- DynamoDB table in us-east-1

The developer writes some lines of YAML. KRO turns it into three resources. ACK provisions the AWS infrastructure. All reconciled continuously.

Step 1: Infrastructure with Terraform

I used the terraform-aws-modules/eks module to set up the foundation:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "Eks-Capabilities"
  cluster_version = "1.34"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_endpoint_public_access           = true
  enable_cluster_creator_admin_permissions = true

  eks_managed_node_groups = {
    main = {
      min_size       = 2
      max_size       = 4
      desired_size   = 2
      instance_types = ["t3.medium"]
    }
  }
}

Terraform also creates the IAM role that EKS Capabilities will assume:

resource "aws_iam_role" "eks_capabilities" {
  name = "Eks-Capabilities-capabilities-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "capabilities.eks.amazonaws.com" }
      Action    = ["sts:AssumeRole", "sts:TagSession"]
    }]
  })
}

The role gets an inline policy with DynamoDB and SQS permissions — the minimum ACK needs to manage those services.

Step 2: Enable Capabilities with One Command Each

After terraform apply and aws eks update-kubeconfig, enabling capabilities is two API calls:

ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
ROLE_ARN="arn:aws:iam::${ACCOUNT_ID}:role/Eks-Capabilities-capabilities-role"

# Enable ACK
aws eks create-capability \
  --region us-east-1 \
  --cluster-name Eks-Capabilities \
  --capability-name ack \
  --type ACK \
  --role-arn $ROLE_ARN \
  --delete-propagation-policy RETAIN

# Enable KRO
aws eks create-capability \
  --region us-east-1 \
  --cluster-name Eks-Capabilities \
  --capability-name kro \
  --type KRO \
  --role-arn $ROLE_ARN \
  --delete-propagation-policy RETAIN

Wait about a minute for each to reach ACTIVE status. That's it — no Helm, no controller pods, no IRSA configuration.

Step 3: Create AWS Resources with kubectl

With ACK active, creating AWS resources feels like creating any Kubernetes resource:

DynamoDB Table:

apiVersion: dynamodb.services.k8s.aws/v1alpha1
kind: Table
metadata:
  name: app-orders-table
spec:
  tableName: Eks-Dev-orders
  attributeDefinitions:
    - attributeName: orderId
      attributeType: S
    - attributeName: customerId
      attributeType: S
  keySchema:
    - attributeName: orderId
      keyType: HASH
    - attributeName: customerId
      keyType: RANGE
  billingMode: PAY_PER_REQUEST

SQS Queue:

apiVersion: sqs.services.k8s.aws/v1alpha1
kind: Queue
metadata:
  name: app-notifications-queue
spec:
  queueName: Eks-Dev-notifications
  visibilityTimeout: "30"
  messageRetentionPeriod: "345600"
  receiveMessageWaitTimeSeconds: "10"

kubectl apply and within seconds, real AWS resources appear in your account. kubectl get table and kubectl get queue show their status. If someone deletes the queue manually in the AWS console, ACK recreates it. That's Kubernetes reconciliation applied to cloud infrastructure.

Step 4: Define a Platform API with KRO

This is where it gets interesting. As a platform engineer, I don't want every developer writing Deployment + Service + Queue manifests. I want them to declare what they need, not how to build it.

KRO lets me define a ResourceGraphDefinition — essentially a template that registers a new Kubernetes API:

apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
  name: webapp
spec:
  schema:
    apiVersion: v1alpha1
    kind: WebApp
    spec:
      appName: string
      image: string
      replicas: integer
      serviceName: string
      queueName: string
  resources:
    - id: deployment
      template:
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: ${schema.spec.appName}
        spec:
          replicas: ${schema.spec.replicas}
          selector:
            matchLabels:
              app: ${schema.spec.appName}
          template:
            metadata:
              labels:
                app: ${schema.spec.appName}
            spec:
              containers:
                - name: app
                  image: ${schema.spec.image}
                  ports:
                    - containerPort: 80

    - id: service
      template:
        apiVersion: v1
        kind: Service
        metadata:
          name: ${schema.spec.serviceName}
        spec:
          selector:
            app: ${schema.spec.appName}
          ports:
            - port: 80
              targetPort: 80
          type: ClusterIP

    - id: queue
      template:
        apiVersion: sqs.services.k8s.aws/v1alpha1
        kind: Queue
        metadata:
          name: ${schema.spec.appName}-queue
        spec:
          queueName: ${schema.spec.queueName}
          visibilityTimeout: "30"
          messageRetentionPeriod: "345600"

After kubectl apply, KRO registers WebApp as a first-class Kubernetes resource. Developers can now kubectl get webapp just like they'd kubectl get deployment.

Step 5: The RBAC Gotcha (The Part That Took Me Hours)

Here's what nobody tells you about EKS Capabilities with KRO.

The capabilities IAM role gets an EKS access entry with two policies:

AmazonEKSACKPolicy — manages ACK custom resources
AmazonEKSKROPolicy — manages KRO's own CRDs (ResourceGraphDefinitions, WebApp instances)

But neither policy grants KRO permission to manage the child Kubernetes resources it creates. When KRO tries to create a Deployment or Service on behalf of the WebApp, it fails silently with permission errors.

The fix: a ClusterRole and ClusterRoleBinding that grants KRO's identity the permissions it needs:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kro-resource-manager
rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: ["sqs.services.k8s.aws"]
    resources: ["queues"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kro-resource-manager-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kro-resource-manager
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: "arn:aws:sts::<ACCOUNT_ID>:assumed-role/Eks-Capabilities-capabilities-role/KRO"

The key insight: KRO's Kubernetes identity is the STS assumed-role ARN with /KRO appended. You can find this by checking the EKS access entries for your cluster.

After applying this RBAC manifest, KRO can create and manage Deployments, Services, and SQS Queues — exactly what the WebApp ResourceGraphDefinition requires.

Step 6: Deploy — 13 Lines of YAML

Now the developer experience:

apiVersion: kro.run/v1alpha1
kind: WebApp
metadata:
  name: orders-app
  namespace: default
spec:
  appName: orders-app
  image: nginx:1.27
  replicas: 2
  serviceName: orders-app-svc
  queueName: Eks-Dev-notifications

kubectl apply -f kro-webapp-instance.yaml

Within seconds:

$ kubectl describe webapp orders-app
State: ACTIVE
Conditions:
  ResourcesReady: True - all resources are created and ready
  Ready: True

$ kubectl get deployment orders-app
NAME         READY   UP-TO-DATE   AVAILABLE
orders-app   2/2     2            2

$ kubectl get service orders-app-svc
NAME             TYPE        CLUSTER-IP       PORT(S)
orders-app-svc   ClusterIP   172.20.159.171   80/TCP

$ kubectl get queue orders-app-queue
NAME               SYNCED   AGE
orders-app-queue   True     30s

One manifest. Three resources. Cloud infrastructure included.

Using the WebApp

Once deployed, here's how to interact with your application:

Access locally via port-forward:

kubectl port-forward svc/orders-app-svc 8080:80
# Open http://localhost:8080

Test from inside the cluster:

kubectl run test-curl --rm -it --image=curlimages/curl \
  -- curl http://orders-app-svc.default.svc.cluster.local

Scale up or down:

kubectl patch webapp orders-app --type merge -p '{"spec":{"replicas":4}}'

Update the image:

kubectl patch webapp orders-app --type merge -p '{"spec":{"image":"nginx:1.28"}}'

Deploy another app using the same template:

apiVersion: kro.run/v1alpha1
kind: WebApp
metadata:
  name: payments-app
spec:
  appName: payments-app
  image: node:20-slim
  replicas: 3
  serviceName: payments-svc
  queueName: Eks-Dev-payments

Same command, different app, full stack created automatically.

Delete everything (app + infrastructure):

kubectl delete webapp orders-app

KRO removes the Deployment, Service, and SQS Queue in the correct order.

Second Gotcha: Kubernetes Naming Rules

When KRO creates child resources, the Kubernetes metadata.name must follow RFC 1123 — lowercase alphanumeric characters, dashes, and dots only.

My original ResourceGraphDefinition used ${schema.spec.queueName} for both the Queue's metadata.name and spec.queueName (the actual AWS queue name). Since the AWS queue was named Eks-Dev-notifications (with uppercase), Kubernetes rejected the resource.

The fix: use ${schema.spec.appName}-queue for metadata.name (always lowercase) and keep ${schema.spec.queueName} for spec.queueName (the AWS-side name that supports mixed case).

Small detail, but it will save you 30 minutes of debugging.

What I'd Add Next

Argo CD — EKS Capabilities supports managed Argo CD, but it requires IAM Identity Center (SSO). For teams without SSO, self-managed Argo CD works just as well and eliminates manual kubectl apply entirely.
External Secrets Operator — sync secrets from AWS Secrets Manager into Kubernetes automatically, so developers never handle credentials.
More KRO templates — a WorkerApp (Deployment + SQS Queue, no Service), a CronJob bundle, an API bundle with Ingress.

Key Takeaways

EKS Capabilities eliminate controller management. No Helm charts, no version tracking, no controller pods. AWS runs them for you.
ACK makes AWS resources Kubernetes-native. DynamoDB tables and SQS queues become just another kubectl get.
KRO is a platform engineering accelerator. Define your golden paths as ResourceGraphDefinitions. Developers get simple APIs. Platform teams enforce standards.
RBAC for managed capabilities is not automatic. KRO needs explicit Kubernetes permissions to create child resources. This is the most common setup issue I've seen.
Kubernetes naming rules apply everywhere. Even when you're creating AWS resources through Kubernetes, the metadata.name must be lowercase RFC 1123 compliant.

The complete code for this project is available on GitHub.

Built with Amazon EKS, ACK, KRO, and Terraform. Infrastructure provisioned in us-east-1.

DEV Community