What are the options you have to Autoscale kubenertes?
- Cluster Autoscaler ---> Scales nodes
- HPA ---> Scales up or down your deployment/replicaset based on resource's CPU utilization
- VPA ---> Automatically adjusts the CPU and memory reservations for your pods
What is Kubernetes Cluster Autoscaler?
- It adjusts the size(scale up and down nodes) of a Kubernetes cluster to meet the current needs.
- Supported by the major cloud platforms
- Cluster Autoscaler typically runs as a Deployment in your cluster.
How does Kubernetes Cluster Autoscaler work?
Cluster AutoScaler checks the status of nodes and pods on a regular basis and takes action based on node usage or pod scheduling status.
When Cluster Autoscaler finds pending pods on a cluster, it will add nodes until the waiting pods are scheduled or the cluster exceeds its maximum node limit.
If node utilization is low, Cluster Autoscaler will remove excess nodes and pods will be able to transfer to other nodes.
So this is not based CPU or memory utilization.
What you should have before deploying Cluster Autoscaler in Kubernetes Cluster?
-
An IAM OIDC provider for your cluster.
Why?
Cluster Autoscaler requires AWS permission to scale up or down nodes. This permissions will be granted through IAM roles for service account. To support IAM roles for service accounts, your cluster needs to have OIDC URL.(The IAM roles for service accounts feature is available on Amazon EKS versions 1.14 and later and for EKS clusters)
The Cluster Autoscaler requires the following tags on your
Auto Scaling groups so that they can be auto-discovered.
k8s.io/cluster-autoscaler/enabled=true
k8s.io/cluster-autoscaler/<cluster-name>=owned
Demo
Lets create a EKS cluster with cluster autoscaling in Terraform way:
1) Lets create the EKS cluster using Terraform.
locals {
name = "eks-scalable-cluster"
cluster_version = "1.20"
region = "ap-southeast-1"
}
###############
# EKS Module
###############
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = local.name
cluster_version = local.cluster_version
vpc_id = module.vpc.vpc_id
subnets = module.vpc.private_subnets
cluster_endpoint_private_access = true
cluster_endpoint_public_access = true
enable_irsa = true
worker_groups = [
{
name = "worker-group-1"
instance_type = "t3.medium"
asg_desired_capacity = 1
asg_max_size = 4
#Cluster autoscaler Auto-Discovery Setup
#https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup
tags = [
{
"key" = "k8s.io/cluster-autoscaler/enabled"
"propagate_at_launch" = "false"
"value" = "true"
},
{
"key" = "k8s.io/cluster-autoscaler/${local.name}"
"propagate_at_launch" = "false"
"value" = "owned"
}
]
}
]
tags = {
clustername = local.name
}
}
data "aws_eks_cluster" "cluster" {
name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks.cluster_id
}
data "aws_availability_zones" "available" {
}
2) Creates an IAM role which can be assumed by trusted resources using OpenID Connect Federated Users (Cluster autoscaler will use these permissions to access AWS Services such as autoscaling,ec2)
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}
locals {
k8s_service_account_namespace = "kube-system"
k8s_service_account_name = "cluster-autoscaler-aws"
}
module "iam_assumable_role_admin" {
#Creates a single IAM role which can be assumed by trusted resources using OpenID Connect Federated Users.
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
version = "~> 4.0"
create_role = true
role_name = "cluster-autoscaler"
provider_url = replace(module.eks.cluster_oidc_issuer_url, "https://", "")
role_policy_arns = [aws_iam_policy.cluster_autoscaler.arn]
oidc_fully_qualified_subjects = ["system:serviceaccount:${local.k8s_service_account_namespace}:${local.k8s_service_account_name}"]
}
resource "aws_iam_policy" "cluster_autoscaler" {
name_prefix = "cluster-autoscaler"
description = "EKS cluster-autoscaler policy for cluster ${module.eks.cluster_id}"
policy = data.aws_iam_policy_document.cluster_autoscaler.json
}
data "aws_iam_policy_document" "cluster_autoscaler" {
statement {
sid = "clusterAutoscalerAll"
effect = "Allow"
actions = [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"ec2:DescribeLaunchTemplateVersions",
]
resources = ["*"]
}
statement {
sid = "clusterAutoscalerOwn"
effect = "Allow"
actions = [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"autoscaling:UpdateAutoScalingGroup",
]
resources = ["*"]
condition {
test = "StringEquals"
variable = "autoscaling:ResourceTag/k8s.io/cluster-autoscaler/${module.eks.cluster_id}"
values = ["owned"]
}
condition {
test = "StringEquals"
variable = "autoscaling:ResourceTag/k8s.io/cluster-autoscaler/enabled"
values = ["true"]
}
}
}
3) Install cluster-autoscaler using helm charts
resource "helm_release" "cluster-autoscaler" {
depends_on = [
module.eks
]
name = "cluster-autoscaler"
namespace = local.k8s_service_account_namespace
repository = "https://kubernetes.github.io/autoscaler"
chart = "cluster-autoscaler"
version = "9.10.7"
create_namespace = false
set {
name = "awsRegion"
value = data.aws_region.current.name
}
set {
name = "rbac.serviceAccount.name"
value = local.k8s_service_account_name
}
set {
name = "rbac.serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
value = module.iam_assumable_role_admin.iam_role_arn
type = "string"
}
set {
name = "autoDiscovery.clusterName"
value = local.name
}
set {
name = "autoDiscovery.enabled"
value = "true"
}
set {
name = "rbac.create"
value = "true"
}
}
Note:-
Make sure your public and private subnets properly tagged which enables automatic subnet discovery so Kubernetes Cloud Controller Manager (cloud-controller-manager) and AWS Load Balancer Controller (aws-load-balancer-controller) can identify which subnets going to use for provisioning a ELB when creating Loadbalancer type Service. If you creating the VPC and Subnets from scratch you may use vpc.tf . Otherwise you can tag you subnets accordingly.
public_subnet_tags = {
"kubernetes.io/cluster/${local.name}" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/${local.name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
You will find my code in https://github.com/chathra222/tf-eks-autoscaling
Deploy all the resources using terraform
terraform init
terraform apply
Once applied,You can check it in AWS management console also.
Lets discover what Kubernetes resources have been provisioned
kubectl get deploy -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
cluster-autoscaler-aws-cluster-autoscaler 1/1 1 1 8h
coredns 2/2 2 2 8h
As you can see there is a deployment called cluster-autoscaler-aws-cluster-autoscaler
in kube-system
namespace.
kubectl get deploy -n kube-system cluster-autoscaler-aws-cluster-autoscaler -o yaml|grep -i serviceAccountName
serviceAccountName: cluster-autoscaler-aws
Lets investigate the service account cluster-autoscaler-aws
kubectl describe sa cluster-autoscaler-aws
Name: cluster-autoscaler-aws
Namespace: kube-system
Labels: app.kubernetes.io/instance=cluster-autoscaler
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aws-cluster-autoscaler
helm.sh/chart=cluster-autoscaler-9.10.7
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::272435851616:role/cluster-autoscaler
meta.helm.sh/release-name: cluster-autoscaler
meta.helm.sh/release-namespace: kube-system
Image pull secrets: <none>
Mountable secrets: cluster-autoscaler-aws-token-x7ds6
Tokens: cluster-autoscaler-aws-token-x7ds6
Events: <none>
you may notice the annotation eks.amazonaws.com/role-arn: arn:aws:iam::272435851616:role/cluster-autoscaler
which says that the this serivce account can assume role arn:aws:iam::272435851616:role/cluster-autoscaler
.
Lets discover cluster-autoscaler
deployment
kubectl get deploy -o yaml cluster-autoscaler-aws-cluster-autoscaler
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: cluster-autoscaler
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2021-11-07T01:10:29Z"
generation: 1
labels:
app.kubernetes.io/instance: cluster-autoscaler
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: aws-cluster-autoscaler
helm.sh/chart: cluster-autoscaler-9.10.7
name: cluster-autoscaler-aws-cluster-autoscaler
namespace: kube-system
resourceVersion: "1292"
uid: 9f0f7f3f-adfd-422f-a007-7c1aa20deb4e
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: cluster-autoscaler
app.kubernetes.io/name: aws-cluster-autoscaler
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: cluster-autoscaler
app.kubernetes.io/name: aws-cluster-autoscaler
spec:
containers:
- command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --namespace=kube-system
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/eks-scalable-cluster
- --logtostderr=true
- --stderrthreshold=info
- --v=4
env:
- name: AWS_REGION
value: ap-southeast-1
image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /health-check
port: 8085
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: aws-cluster-autoscaler
ports:
- containerPort: 8085
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: cluster-autoscaler-aws
serviceAccountName: cluster-autoscaler-aws
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-11-07T01:11:42Z"
lastUpdateTime: "2021-11-07T01:11:42Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-11-07T01:10:29Z"
lastUpdateTime: "2021-11-07T01:11:42Z"
message: ReplicaSet "cluster-autoscaler-aws-cluster-autoscaler-74977bcc47" has
successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
There are various parameters for cluster-autoscaler
. You may refer https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca to customize according to your need.
This link is a really good if you want to understand further about cluster autoscaler.
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md
Lets save more costs by using these autoscaling options wisely. :-)
Top comments (0)