DEV Community: Taufique

EKS Cluster Upgrade steps with Zero downtime

Taufique — Tue, 20 Jan 2026 04:25:47 +0000

EKS Cluster Upgrade steps with Zero downtime.

Introduction:

Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service offered by Amazon Web Services (AWS) that simplifies the process of deploying, managing, and scaling containerized applications using Kubernetes on AWS infrastructure. Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications.

EKS abstracts away the complexities of Kubernetes cluster management, allowing users to focus on developing and running their applications. With EKS, AWS manages the control plane, which includes components like the API server, scheduler, and etcd, ensuring high availability and scalability of the Kubernetes control plane.

One of the key concepts in EKS is node groups. Node groups are a collection of EC2 instances (virtual machines) that act as worker nodes in the Kubernetes cluster. These nodes run the containerized applications and execute tasks such as scheduling, networking, and storage. Node groups can be configured with various instance types, sizes, and configurations to meet specific workload requirements.

Node groups in EKS are highly flexible and scalable, allowing users to add or remove nodes dynamically based on workload demands. This elasticity ensures optimal resource utilization and cost efficiency. Additionally, node groups can be spread across multiple Availability Zones for improved fault tolerance and high availability.

Objective:

As a microservice cluster, EKS undergoes periodic upgrades to maintain its stability, security, and performance. It's essential to keep the EKS cluster updated as AWS may discontinue support for older clusters over time. Regularly updating the EKS cluster ensures compatibility with AWS services, receives ongoing support, and incorporates new features and security patches provided by AWS.

Overview:

When upgrading an EKS cluster from one version to another (e.g., from 1.24 to 1.28), it's crucial to minimize downtime for applications. Typically, during the upgrade process, you may encounter errors related to incompatible node groups. To avoid downtime,follow below steps.

Key components:

Create Manifest for Intermediate Version: Generate a manifest for the intermediate version (e.g., 1.26) of the EKS cluster.

Create Node Group with Intermediate Version: Deploy a new node group using the intermediate version (1.26). This ensures compatibility between the cluster and the new nodes.

Update Deployment Affinity: Update the affinity settings in your application's deployment configuration to ensure a smooth rolling update to the new node group. Set the max surge to 100% to maintain application availability during the update.

Delete Old Node Group: Once the new node group is successfully deployed and your application is running smoothly on it, delete the old node group with the previous version (1.24).

Update EKS Cluster Version: In the EKS console, update the cluster version from the intermediate version (1.26) to the target version (1.28).

Create Manifest for Target Version: Generate a new manifest for the target version (1.28) of the EKS cluster.

Create Node Group with Target Version: Deploy a new node group using the target version (1.28) of the EKS cluster.

Delete Intermediate Node Group: Once the new node group with the target version is operational and applications are running smoothly, delete the node group with the intermediate version (1.26).

Update kubectl Version: Finally, update the kubectl version on your local machine to the latest compatible version with the upgraded EKS cluster.

By following these steps, you can minimize downtime during the EKS cluster upgrade process and ensure the smooth transition of your applications to the new cluster version.

Cluster Upgrade process in EKS console:

To begin, ensure your cluster, currently at version 1.24. To ensure optimal performance and compatibility, aim to keep your cluster updated to the latest N-1 version, currently 1.28, while the latest is 1.29, ensuring access to new features while maintaining stability and support from AWS.

Within the cluster, there is one node group that also requires updating to the latest N-1 version, aligning with the cluster's version 1.24, to ensure seamless operation and compatibility with AWS services.

Select the "Upgrade Now" option to initiate the upgrade process from version 1.24 to version 1.25, ensuring the cluster remains up-to-date with the latest enhancements and features.

The cluster has been successfully updated to the newer version, ensuring it remains current with the latest features, improvements, and security enhancements.

Repeat the previous steps to upgrade the cluster from version 1.25 to version 1.26, ensuring continued alignment with the latest advancements and optimizations in Kubernetes.

After upgrading the cluster to version 1.26, you may encounter an error when attempting to further upgrade to version 1.27 due to the need to update the node groups to match the cluster version. Refer to the provided image for details on the error message.

Creating a Manifest file with 1.26version:

To minimize downtime, we can create new node groups with version 1.26, avoiding the need to delete existing ones. Below is the manifest file for creating the new node groups:

apiVersion: eksctl.io/v1alpha5

kind: ClusterConfig

metadata:

region: ap-south-2

version: "1.26"

vpc:

id: "vpc-0d13d6b46ae0a1e28"

cidr: "172.31.0.0/16"

subnets:

public:

 Zero-Pub-2a:

   id: "subnet-075876fd0bcead8a2"

 Zero-Pub-2c:

   id: "subnet-0d97057ad00f8f0c9"

managedNodeGroups:

name: zerodowntime-clusterNG-new

ami: "ami-0879d75b626dd9f41"

amiFamily: AmazonLinux2

overrideBootstrapCommand: |

 #!/bin/bash

 /etc/eks/bootstrap.sh zerodowntime-cluster --container-runtime containerd

minSize: 1

maxSize: 2

desiredCapacity: 1

instanceType: "t3.xlarge"

volumeSize: 20

volumeEncrypted: true

privateNetworking: true

subnets:

 - Zero-Pub-2a

 - Zero-Pub-2c

labels: {role: zero-downtime-test}

ssh:

 publicKeyName: zero-downgrade-cluster

tags:

 nodegroup-role: Zero-downgrade-cluster-role

 nodegroup-name: zerodowntime-clusterNG-new

 Project: POC

 Env: Zero-app

 Layer: App-OD

 Managedby: Workmates

iam:

 attachPolicyARNs:

   - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

   - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy

   - arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess

   - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

   - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

 withAddonPolicies:

   autoScaler: true

   externalDNS: true

   certManager: true

   ebs: true

   efs: true

   albIngress: true

   cloudWatch: true

Access the Jenkins server and switch to the Jenkins user. Create a new YAML manifest file, save it with a .yaml extension, and paste the provided manifest for creating the new node groups with version 1.26.

The following is the content of the manifest file saved in YAML format.

Verify the available node groups in the cluster using the following command.

eksctl get nodegroup --cluster= --region=

As you can see in above picture there is only one nodegroup

Creating the nodegroups with 1.26 version manifest file:

Create new nodegroups with the new 1.26version manifest using below command.

eksctl create nodegroup -f

Now you can see the second node group with 1.26v is successfully created.

Update the affinity in deployment file with a rolling update of 100% as max surge.

Updating Affinity in Deployment:

Once the node groups are created:

Modify the deployment YAML files to adjust the affinity:

strategy:

type: RollingUpdate

rollingUpdate:

maxSurge: 100%

maxUnavailable: 0%

Also, update the labels.

Do the same Affinity changes for All Applications:

1) Repeat the above process for all deployments, ensuring consistent updates to the affinity and labels across all applications. This ensures that all applications utilize the new node groups effectively.

Deleting the node groups with older version:

Once check the nodegroups with cluster with below command.

eksctl get nodegroup --cluster= --region=

Since we have two nodegroups we can go-ahead to delete the old nodegroup with below command.

eksctl delete nodegroup --cluster= --name= --region --disable-eviction

Now you can see the old nodegroup is deleted.

Upgrading the cluster from 1.26 to 1.28 version in EKS console:

Since the new node groups with cluster version are matched go ahead and update the cluster from 1.26v to 1.27v.

The cluster has been successfully updated to the newer version, ensuring it remains current with the latest features, improvements, and security enhancements.

Repeat the previous steps to upgrade the cluster from version 1.27 to version 1.28, ensuring continued alignment with the latest advancements and optimizations in Kubernetes.

Now my cluster is upgraded to 1.28 version.

Creating the Manifest file with 1.28version:

Now we need to prepare a manifest with 1.28 version to update the node groups to 1.28.

First retrieve the ami id which supports cluster version 1.28. Use the below command to retrieve the ami id. Kindly specify the region.

aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.28/amazon-linux-2/recommended/image_id --region ap-south-2 --query "Parameter.Value" --output text

Add the ami id in manifest and save the file.

This manifest file is for 1.28version of eks cluster nodegroups.

Check the below manifest file and update the necessary fields.

apiVersion: eksctl.io/v1alpha5

kind: ClusterConfig

metadata:

region: ap-south-2

version: "1.28"

vpc:

id: "vpc-0d13d6b46ae0a1e28"

cidr: "172.31.0.0/16"

subnets:

public:

 Zero-Pub-2a:

   id: "subnet-075876fd0bcead8a2"

 Zero-Pub-2c:

   id: "subnet-0d97057ad00f8f0c9"

managedNodeGroups:

name: zerodowntime-clusterNG

ami: "ami-06cd43657081b7a56"

amiFamily: AmazonLinux2

overrideBootstrapCommand: |

 #!/bin/bash

 /etc/eks/bootstrap.sh zerodowntime-cluster --container-runtime containerd

minSize: 1

maxSize: 2

desiredCapacity: 1

instanceType: "t3.xlarge"

volumeSize: 20

volumeEncrypted: true

privateNetworking: true

subnets:

 - Zero-Pub-2a

 - Zero-Pub-2c

labels: {role: zero-downtime-test}

ssh:

 publicKeyName: zero-downgrade-cluster

tags:

 nodegroup-role: Zero-downgrade-cluster-role

 nodegroup-name: zerodowntime-clusterNG

 Project: POC

 Env: Zero-app

 Layer: App-OD

 Managedby: Workmates

iam:

 attachPolicyARNs:

   - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

   - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy

   - arn:aws:iam::aws:policy/ElasticLoadBalancingFullAccess

   - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

   - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

 withAddonPolicies:

   autoScaler: true

   externalDNS: true

   certManager: true

   ebs: true

   efs: true

   albIngress: true

   cloudWatch: true

Creating the nodegroups with 1.28 version manifest file:

Once check the nodegroups available for the cluster with below command.

eksctl get nodegroup --cluster= --region=

As you can see in the above image i have only one node group with -new tag

Now go-ahead and run the manifest file to create new updated node groups with 1.28version.

eksctl create nodegroup -f

Now you can see a new nodegroup is created with 1.28version which supports eks cluster.

2) Updating Affinity in Deployment:

Once the node groups are created:

Modify the deployment YAML files to adjust the affinity:

strategy:

type: RollingUpdate

rollingUpdate:

maxSurge: 100%

maxUnavailable: 0%

Also, update the labels.

Do the same Affinity changes for All Applications:

3) Repeat the above process for all deployments, ensuring consistent updates to the affinity and labels across all applications. This ensures that all applications utilize the new node groups effectively.

Deleting the node groups with older version:

Once the affinity is updated delete the nodegroups with -new tag which is 1.26version with below command.

eksctl delete nodegroup --cluster= --name= --region ap-south-2 --disable-eviction

As you can see in below image the node group with -new name is deleting state.

Now the old node group is deleted which remains only one node group with 1.28version

Installing or updating kubectl:

Determine which version of kubectl you are using with below command.

kubectl version --client

Use the below AWS documentation link for information on kubectl commands.

https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html

Install or update kubectl on macOS, Linux, and Windows operating systems.

curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/1.28.8/2024-04-19/bin/linux/amd64/kubectl

Apply execute permissions to the binary.

chmod +x ./kubectl

Copy the binary to a folder in your PATH. If you have already installed a version of kubectl, then we recommend creating a $HOME/bin/kubectl and ensuring that $HOME/bin comes first in your $PATH.

(Optional) Add the $HOME/bin path to your shell initialization file so that it is configured when you open a shell.

echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc

Determine which version of kubectl you are using with below command.

kubectl version --client

Alerting Setup in EKS using Prometheus

Taufique — Sat, 17 Jan 2026 15:54:43 +0000

Introduction

Overview:

This SOP provides detailed instructions for configuring alerting in an Amazon EKS cluster using Prometheus. Prometheus is an open-source monitoring and alerting toolkit widely used in Kubernetes environments for real-time monitoring and proactive alerting based on metrics. The integration ensures timely notifications about potential issues, enabling swift action to maintain system health and reliability.

Prometheus:

Prometheus is a powerful open-source monitoring system designed for collecting, storing, and querying time-series data. It is highly scalable and well-suited for cloud-native environments, especially Kubernetes. Prometheus collects metrics from configured targets, evaluates defined rules, and enables queries using its PromQL language. With its robust integration capabilities, Prometheus is a cornerstone of modern observability stacks, offering insights into application and infrastructure performance.

Alerts for Pods:

Alerting for pods involves monitoring the health, resource usage, and performance of Kubernetes pods and triggering alerts when specific thresholds or conditions are breached. For example, alerts can be set for high CPU or memory usage, pod restarts, or readiness and liveness probe failures. This ensures teams are promptly notified of potential issues, enabling them to take corrective actions to maintain application availability and reliability.

Objective:

The objective of this SOP is to:

Set up Prometheus in an EKS cluster for monitoring.
Configure alerting rules for key performance metrics and resource utilization.
Integrate Prometheus with Alertmanager to route alerts to notification channels like email.
Key Components:

Amazon EKS: The managed Kubernetes service that hosts your applications.
Prometheus: The monitoring and alerting toolkit for collecting metrics.
Alertmanager: A component of Prometheus responsible for managing alerts and routing them to configured endpoints.
Kubernetes Metrics Server: A lightweight service for gathering resource metrics like CPU and memory.
Prerequisites:

EKS Cluster: A fully operational EKS cluster with kubectl configured for access.
Prometheus Operator: Deployed in the EKS cluster for managing Prometheus configurations.
Alertmanager: Installed alongside Prometheus in the cluster.
IAM Permissions: Sufficient AWS IAM permissions to manage resources in the EKS cluster.

Procedure
Initial Setup:

Login to Jenkins User:
sudo su - jenkins

mkdir

Installation of Prometheus Stack:

Check prometheus Repo if not present add the Prometheus through Helm Package Manager of K8’s:

helm repo ls

Repo is not added so, add the prom repo using below commands:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

helm pull prometheus-community/kube-prometheus-stack

Extract the tar file

tar -xvf

Adding Affinity in the Values file:

Find the values.yaml file inside prom stack.
cd into the extracted tar file and check the files.
Take a backup of it.

And copy the content indise the values file and open it in IDE for modification.

Before doing the modification find the role of the Node Group find as below.

Use the below syntax for Affinity:

**affinity:

 nodeAffinity:

    requiredDuringSchedulingIgnoredDuringExecution:

        nodeSelectorTerms:

         - matchExpressions:

            - key: role

            operator: In

            values:

            - Production-Magnifi-Monitoring-NG

**
Before modification of Affinity

After Modification:

After modification of entire yaml file replace with old content with newly modified content.

Go to charts folder and find grafana folder futher find the values file inside the grafana folder.

Follow above steps to modify the affinity in grafana values file.

Deploy the prom stack using Helm Package Manager:

helm install prom-stack . -f values.yaml -n monitoring --create-namespace

Check all pods are in running state:

k get all -n monitoring

Setting up the Alerts:

Delete all the default rules of Prometheus expect the below one:

Delete all the rules expect

**prom-stack-kube-prometheus-kubernetes-apps

prom-stack-kube-prometheus-k8s.rules.container-cpu-usage-second**

Find the prometheusrule.

k get prometheusrules -n monitoring

Delete the default rules

k delete prometheusrules -n

Check the rules we left

**prom-stack-kube-prometheus-k8s.rules.container-cpu-usage-second

prom-stack-kube-prometheus-kubernetes-apps **

k get prometheusrules -n monitoring

Edit the above rule.

k edit prometheusrules prom-stack-kube-prometheus-kubernetes-apps -n monitoring

Delete the all the rules under spec below the rules.

Add the new rules as given below:

Reference Rule:

rules:

alert: MagnifiProductionPodRestart

annotations:

description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has restarted {{
```
 $value }} times.
```
expr: kube_pod_container_status_restarts_total{namespace="prod"} > 0

for: 1m

labels:

severity: production-critical
alert: MagnifiProductionPodPending

annotations:

description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has pending {{
```
 $value }} times.
```
expr: kube_pod_status_phase{namespace="prod", phase="Pending"} == 1

for: 1m

labels:

severity: production-critical

Add the SMTP credentails in prometheus secrets, Follow the below steps:

Create SMPT user.

Create Access Key and Secret Key:

Use the below format and add the below credentails.

global:

resolve_timeout: 5m

smtp_from:

smtp_smarthost: email-smtp.ap-south-1.amazonaws.com:587

smtp_auth_username:

smtp_auth_password:

smtp_require_tls: true

route:

receiver: support

group_by:

job
monitor_type
severity
alertname
namespace

routes:

receiver: support

match:

 alertname: <Alert_Name_1>

receiver: support

match:

 alertname: <Alert_Name_2>

group_wait: 30s

group_interval: 5m

repeat_interval: 1h

receivers:

name: support

email_configs:

send_resolved: true

to:

send_resolved: true

to:

templates:

'/etc/alertmanager/config/*.tmpl'

After doing modification encode the above format.

Link: https://www.base64encode.org/

Make the change secrets of alert manager:

To know the secrets in monitoring namespace

k get secrets -n monitoring

Edit the Alert Manager secret

k edit secrets alertmanager-prom-stack-kube-prometheus-alertmanager -n monitoring

Remove the Old Secret after the alertmanager.yaml.

Add the encoded secrets.

Alerts Testing:

Pod Restart Testing:

vi restart-pod.yaml

apiVersion: v1

kind: Pod

metadata:

spec:

restartPolicy: Always

containers:

name: my-container

image: nginx:latest

command: ["/bin/sh", "-c", "exit 1"]

Got Alert for Pod Restart Firing and Resolved:

Got Alert for Pod Pending Firing and Resolved:

vi pending-pod.yaml

apiVersion: v1

kind: Pod

metadata:

spec:

containers:

name: my-container

image: non-existing-image:latest

k apply -f pending-pod.yaml -n prod

k delete po pending-pod -n prod

Got Alert for Pod PendingFiring and Resolved:

Expose grafana prom and alertmanager Services:

Check the services of monitoring

k get svc -n

Create a ingress file for prometheus

vi prometheus-ingress.yaml

Replace the Service and port of the related service.

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

annotations:

kubernetes.io/ingress.class: nginx

nginx.ingress.kubernetes.io/ssl-redirect: "true"

nginx.ingress.kubernetes.io/proxy-body-size: 5m

namespace: monitoring

spec:

rules:

host: grafana-prod-mumbai.illusto.com

http:

 paths:

 - backend:

     service:

       name: prom-stack-grafana

       port:

number: 80

   path: /

   pathType: ImplementationSpecific

host: prom-prod-mumbai.illusto.com

http:

 paths:

 - backend:

     service:

       name: prom-stack-kube-prometheus-prometheus

       port:

number: 9090

   path: /

   pathType: ImplementationSpecific

host: alert-prod-mumbai.illusto.com

http:

 paths:

 - backend:

     service:

       name: prom-stack-kube-prometheus-alertmanager

       port:

number: 9093

   path: /

   pathType: ImplementationSpecific

Create the ingress rules

k apply -f prometheus-ingress.yaml

Check the ingress.

k get ingress -n monitoring

Add the Route53 records:

Grafana Dashboard:

Prometheus Dashboard:

AlertManager Dashboard:

Scope

This SOP applies to DevOps, monitoring, and SRE teams tasked with maintaining the reliability and performance of applications deployed on Amazon EKS. It is applicable for both staging and production environments.

Roles and Responsibilities

DevOps Engineers:

1.Responsible for deploying and configuring Prometheus and Alertmanager.

2.Define and maintain alerting rules based on organizational needs.

3.Act on received alerts to resolve issues and ensure high availability.

4.Validate alerting configurations to ensure compliance with security protocols.

Enforcement

1.Policy Compliance: All alerting configurations must follow organizational monitoring and alerting standards.

2.Access Control: Only authorized personnel are allowed to modify Prometheus and Alertmanager configurations.

3.Auditing: Regular audits of alerting rules should be conducted to ensure effectiveness and compliance.

Conclusion:

Alerting in EKS using Prometheus provides a robust mechanism to proactively monitor the health and performance of applications. By following this SOP, teams can ensure timely notifications for critical issues, reducing downtime and maintaining application reliability in dynamic Kubernetes environments.

Deploying and Configuring ALB Ingress Controller on EKS

Taufique — Sat, 17 Jan 2026 15:20:47 +0000

1. Introduction

Overview:

This SOP explains the purpose, scope, and best practices for setting up a ALB Ingress Controller.The AWS Load Balancer Controller manages AWS Elastic Load Balancers for a Kubernetes cluster. You can use the controller to expose your cluster apps to the internet. The controller provisions AWS load balancers that point to cluster Service or Ingress resources. In other words, the controller creates a single IP address or DNS name that points to multiple pods in your cluster.

The controller watches for Kubernetes Ingress or Service resources. In response, it creates the appropriate AWS Elastic Load Balancing resources. You can configure the specific behavior of the load balancers by applying annotations to the Kubernetes resources. For example, you can attach AWS security groups to load balancers using annotations.

Objective:

The objective of this document is to provide a comprehensive guide for implementing the AWS ALB Ingress Controller on AWS. This setup will enable efficient routing and load balancing of incoming traffic to Kubernetes services running on an Amazon Elastic Kubernetes Service (EKS) cluster using Application Load Balancers.

Key Components:

AWS ALB Ingress Controller:

The AWS ALB Ingress Controller is a Kubernetes resource that manages AWS Application Load Balancer configuration to route incoming traffic to Kubernetes services.

Amazon EKS:

Amazon Elastic Kubernetes Service is a managed Kubernetes service provided by AWS that simplifies the process of deploying, managing, and scaling containerized applications using Kubernetes.

Amazon Route 53:

Amazon Route 53 is a scalable and highly available Domain Name System (DNS) web service provided by AWS. It enables routing traffic to AWS resources, including the AWS ALB Ingress Controller.

AWS Application Load Balancer (ALB):

ALB automatically distributes incoming application traffic across multiple targets, such as EC2 instances, containers, and IP addresses, within one or more Availability Zones.

Prerequisites:

Before proceeding with the implementation, ensure the following prerequisites are met:

AWS Account:

Access to an AWS account with permissions to create and manage resources such as EKS clusters, Route 53 records, and Application Load Balancers.

Kubernetes Cluster:

An Amazon EKS cluster should be provisioned and running. Ensure the cluster is properly configured with networking, IAM roles, and necessary node groups.

kubectl CLI:

Install and configure the kubectl command-line tool to interact with the Kubernetes cluster.

Helm (Optional):

If using Helm for deploying the AWS ALB Ingress Controller, ensure Helm is installed and configured.

Access to DNS:

Have access to manage DNS records, as you will need to create DNS records to route traffic to the AWS ALB Ingress Controller.

Procedure

Step Create IAM Role using eksctl

Create an IAM policy.

Download an IAM policy for the AWS Load Balancer Controller that allows it to make calls to AWS APIs on your behalf.

curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.7.2/docs/install/iam_policy.json

2.Create an IAM policy using the policy downloaded in the previous step.

aws iam create-policy \

--policy-name AWSLoadBalancerControllerIAMPolicy \

--policy-document file://iam_policy.json

Create IAM Role using eksctl

Replace my-cluster with the name of your cluster, 111122223333 with your account ID, and then run the command. If your cluster is in the AWS GovCloud (US-East) or AWS GovCloud (US-West) AWS Regions, then replace arn:aws: with arn:aws-us-gov:.

eksctl create iamserviceaccount \

--cluster=my-cluster \

--namespace=kube-system \

--name=aws-load-balancer-controller \

--role-name AmazonEKSLoadBalancerControllerRole \

--attach-policy-arn=arn:aws:iam::111122223333:policy/AWSLoadBalancerControllerIAMPolicy \

--approve

eksctl create iamserviceaccount \

--cluster=demo-cluster \

--namespace=kube-system \

--name=aws-load-balancer-controller \

--role-name AmazonEKSLoadBalancerControllerRole \

--attach-policy-arn=arn:aws:iam::975050347443:policy/AWSLoadBalancerControllerIAMPolicy \

--region=us-east-1

--approve

The above error is came due to OIDC is not enabled

Commands to configure IAM OIDC provider

export cluster_name=demo-cluster

oidc_id=$(aws eks describe-cluster --name $cluster_name --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5)

Check if there is an IAM OIDC provider configured already

eksctl utils associate-iam-oidc-provider --cluster $cluster_name --region us-east-1 --approve

Step Install AWS Load Balancer Controller

Install AWS Load Balancer Controller using Helm V3:

Add the eks-charts Helm chart repository. AWS maintains this repository on GitHub.

helm repo add eks https://aws.github.io/eks-charts

Update your local repo to make sure that you have the most recent charts.

helm repo update eks

To view the available versions of the Helm Chart and Load Balancer Controller, use the following command:

helm search repo eks/aws-load-balancer-controller --versions

3.Install the AWS Load Balancer Controller.

Replace my-cluster with the name of your cluster. In the following command, aws-load-balancer-controller is the Kubernetes service account that you created in a previous step.

For more information about configuring the helm chart, see values.yaml on GitHub.

Ref Link:

https://github.com/aws/eks-charts/blob/master/stable/aws-load-balancer-controller/values.yaml

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \

-n kube-system \

--set clusterName=my-cluster \

--set serviceAccount.create=false \

--set serviceAccount.name=aws-load-balancer-controller

If you're deploying the controller to Amazon EC2 nodes that have restricted access to the Amazon EC2 instance metadata service (IMDS), or if you're deploying to Fargate, then add the following flags to the helm command that follows:

--set region=region-code

--set vpcId=vpc-xxxxxxxx

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \

-n kube-system \

--set clusterName=demo-cluster \

--set serviceAccount.create=false \

--set serviceAccount.name=aws-load-balancer-controller \

--set autoDiscoverAwsRegion=true \

--set autoDiscoverAwsVpcID=true

--set region=us-east-1 \

--set vpcId=vpc-08ce046624ab1b564

Step Verify that the controller is installed

Verify that the controller is installed.

kubectl get deployment -n kube-system aws-load-balancer-controller

Step Deploy a sample application

Prerequisites:

At least one public or private subnet in your cluster VPC.

Have the AWS Load Balancer Controller deployed on your cluster. For more information, see What is the AWS Load Balancer Controller?. We recommend version 2.7.2 or later.

Deploy the game 2048 as a sample application.

To verify that the AWS Load Balancer Controller creates an AWS ALB as a result of the ingress object.

Complete the steps for the type of subnet you're deploying to.

If you're deploying to Pods in a cluster that you created with the IPv6 family, skip to the next step.

If Public execute below command

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.7.2/docs/examples/2048/2048_full.yaml

If Private follow below steps

Download the manifest.

curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.7.2/docs/examples/2048/2048_full.yaml

Edit the file and find the line that says alb.ingress.kubernetes.io/scheme: internet-facing.
Change internet-facing to internal and save the file.

Apply the manifest to your cluster.

kubectl apply -f 2048_full.yaml

If you're deploying to Pods in a cluster that you created with the IPv6 family, complete the following steps.

Download the manifest.

curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.7.2/docs/examples/2048/2048_full.yaml

Open the file in an editor and add the following line to the annotations in the ingress spec.

alb.ingress.kubernetes.io/ip-address-type: dualstack

If you're load balancing to internal Pods, rather than internet facing Pods,

Change the line that says

alb.ingress.kubernetes.io/scheme: internet-facing to alb.ingress.kubernetes.io/scheme: internal

Save the file.

Apply the manifest to your cluster.

kubectl apply -f 2048_full.yaml

After a few minutes, verify that the ingress resource was created with the following command.

For testing here i have used public

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.7.2/docs/examples/2048/2048_full.yaml

Check the deployed application is working properly or not.

kubectl get all -n game-2048

Check the ingress

kubectl get ingress -n game-2048

Check the application using the endpoint

Upto now it is completed ACM is in progress

Step Create a ACM certifiacte with the required domain.

Choose Request a certificate.

In the Domain names section, type your domain name.

Validate the Certificate by adding the records in DNS.

Add the records in Route53

Updating the Controller with ACM

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \

-n kube-system \

--set clusterName=demo-cluster \

--set serviceAccount.create=false \

--set serviceAccount.name=aws-load-balancer-controller \

--set autoDiscoverAwsRegion=true \

--set autoDiscoverAwsVpcID=true \

--set region=us-east-1 \

--set vpcId=vpc-08ce046624ab1b564 \

--set enableShield=false \ # optional: enable AWS Shield Advanced for the ALB

--set enableWaf=false \ # optional: enable AWS WAF (Web Application Firewall) for the ALB

--set acm.enabled=true \ # enable ACM integration

--set acm.defaultRegion=us-east-1 \ # specify the default region for ACM

--set acm.managed=false \ # set to false as you're providing the certificate ARN

--set acm.certArn=arn:aws:acm:us-west-2:XXXXXXXX:certificate/XXXXXX-XXXXXXX-XXXXXXX-XXXXXXXX

Step Final Testing Phase with ACM by deploying Prom Stack.

Testing the ingress deploy prom-stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

helm install prometheus prometheus-community/kube-prometheus-stack --create-namespace -n monitoring

Ingreess rules creation

Create a ingress file for prometheus

vi prometheus-ingress.yaml

Replace the Service and port of the related service.

apiVersion: networking.k8s.io/v1

kind: Ingress

metadata:

annotations:

kubernetes.io/ingress.class: nginx

nginx.ingress.kubernetes.io/ssl-redirect: "true"

nginx.ingress.kubernetes.io/proxy-body-size: 5m

namespace: monitoring

spec:

rules:

host: grafana-prod-mumbai.illusto.com

http:

 paths:

 - backend:

     service:

       name: prom-stack-grafana

       port:

number: 80

   path: /

   pathType: ImplementationSpecific

host: prom-prod-mumbai.illusto.com

http:

 paths:

 - backend:

     service:

       name: prom-stack-kube-prometheus-prometheus

       port:

number: 9090

   path: /

   pathType: ImplementationSpecific

host: alert-prod-mumbai.illusto.com

http:

 paths:

 - backend:

     service:

       name: prom-stack-kube-prometheus-alertmanager

       port:

number: 9093

   path: /

   pathType: ImplementationSpecific

Create the ingress rules

k apply -f prometheus-ingress.yaml

Check the ingress.

k get ingress -n monitoring

Add the Route53 records:

Grafana Dashboard:

Prometheus Dashboard:

AlertManager Dashboard:

Scope

This SOP covers the deployment and management of the AWS Load Balancer Controller in a Kubernetes cluster. It is intended for use in exposing cluster applications to the internet through AWS Elastic Load Balancers. The document applies to system administrators, DevOps engineers, and any team members responsible for maintaining Kubernetes clusters on AWS. It also includes best practices for managing Ingress resources and configuring load balancers to meet security and scalability requirements.

Roles and Responsibilities

DevOps Engineers:

Ensure the AWS Load Balancer Controller is installed and configured correctly in the Kubernetes cluster.
Monitor load balancer performance and ensure compliance with the organization’s network and security policies.
Define and apply Kubernetes Ingress or Service resources with proper annotations for load balancer configuration.
Automate the deployment of load balancers using Infrastructure as Code (IaC) tools like Terraform or Helm.
Configure AWS security groups to allow necessary traffic to and from the load balancers.
Verify that DNS names and IP addresses assigned to the load balancers are correctly pointing to the Kubernetes resources.
Ensure the setup adheres to security standards and regulatory requirements, including data protection and access control policies.

Enforcement
This SOP is enforced by regular audits of the Kubernetes and AWS infrastructure to ensure compliance with defined configurations. Automated monitoring tools, such as Prometheus, should be used to track load balancer health, traffic patterns, and security group configurations. Any deviations from the established guidelines should be logged, reported, and addressed promptly.

Conclusion

Implementing the AWS Load Balancer Controller is a critical step in securely and efficiently managing application traffic in a Kubernetes cluster. By following the practices outlined in this SOP, organizations can ensure reliable application delivery while maintaining scalability and security. Proper configuration and monitoring of the controller and associated resources will significantly reduce the risks of downtime, unauthorized access, or performance bottlenecks.

Ref Links

https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html

https://github.com/kubernetes-sigs/aws-load-balancer-controller

https://docs.aws.amazon.com/eks/latest/userguide/lbc-helm.html

https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html

https://github.com/aws/eks-charts/blob/master/stable/aws-load-balancer-controller/values.yaml

https://docs.aws.amazon.com/eks/latest/userguide/lbc-manifest.html

https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html

https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/ingress/annotations/

https://github.com/iam-veeramalla/aws-devops-zero-to-hero/blob/main/day-22/2048-app-deploy-ingress.md