DEV Community

Cover image for FinOps EKS: 10 tips to reduce the bill up to 90% on AWS managed Kubernetes clusters

FinOps EKS: 10 tips to reduce the bill up to 90% on AWS managed Kubernetes clusters

bcouetil profile image Benoît COUETIL Updated on ・8 min read

Cloud computing allows rationalization of infrastructure costs. That being said, when you start, the bill can quickly go up. EKS, the Kubernetes managed service from AWS, is no exception. Here are some tips to help you reduce your costs up to 90%, without lowering the level of service.

The first element to take into consideration is human: avoid manual infrastructure modifications. So, first of all, use automation tools such as Terraform, otherwise you will lower your AWS bill, but it might cost way more in salary. With Terraform, installing all of the infrastructure associated with EKS, including VPC, takes approximately 15 minutes. Each update advocated in this article, if it concerns the infrastructure, would then only take a few minutes.

Most of these generic aspects can be applied, at least in philosophy, to other Cloud providers.

1. Install the Cluster Autoscaler

One of the most obvious ways to ensure that you only provision the necessary resources is to install the Cluster Autoscaler.

It is mainly a pod in the cluster that will monitor the requested resources and will provision or delete the nodes (compute VMs) as needed.

The configuration has to be adjusted according to the context. Here is an example below. Note the forced eviction of system pods having local storage, for really efficient downscale:

  # scale-down-utilization-threshold: 0.5 # default
  scan-interval: 30s # default 10s
  # scale-down-delay-after-add: 10 # default
  scale-down-unneeded-time: 20m # default 10m
  scale-down-unready-time: 5m # default 20m
  skip-nodes-with-local-storage: false
  skip-nodes-with-system-pods: false
Enter fullscreen mode Exit fullscreen mode

2. Define Spot Instances nodes in a Launch Template

Spot Instances are EC2s (AWS VMs) but less expensive (up to 90%). This is AWS selling off its unused resources, but they are preemptible, the user runs the risk of losing the EC2 in question if AWS falls short of on-demand resources of the same type. It is no longer a price to the highest bidder for several years: It is a price that changes slightly, depending on long-term demand.

A Kubernetes cluster is ideal for using this kind of preemptible resource: it is able to handle errors and self-repair by provisioning other VMs for its nodes. If your applications respect cloud patterns such as 12-Factor App methodology, you can safely use that.

The risk of losing your EC2s is low but real: over 4 months of using T3 on eu-west-1 (Ireland), 2 days of unavailability of this type were observed in a zone. How to mitigate this problem? By creating a pool of EC2s in the form of a Launch Template: you can ask for exactly 2 types of instance among a set of types, which AWS orders by price. If an inexpensive type is unavailable, AWS will automatically provision a slightly more expensive type.

Example of a zonal worker_groups_launch_template under Terraform:

worker_groups_launch_template = [
  name = "spot-az-a"
  subnets = [module.vpc.private_subnets[0]] # only one subnet to simplify PV usage
  on_demand_base_capacity = "0"
  # on_demand_percentage_above_base_capacity = 0 # If not set, all new nodes will be spot instances
  override_instance_types  = ["t3a.xlarge", "t3.xlarge", "t2.xlarge", "m4.xlarge", "m5.xlarge", "m5a.xlarge"]
  spot_allocation_strategy = "lowest-price"
  spot_instance_pools      = 2 # "Number of Spot pools per availability zone to allocate capacity. EC2 Auto Scaling selects the cheapest Spot pools and evenly allocates Spot capacity across the number of Spot pools that you specify."
  asg_desired_capacity     = "1"
  asg_min_size             = "0"
  asg_max_size             = "10"
  key_name                 = var.cluster_name
  kubelet_extra_args       = "--node-labels=lifecycle=spot"
Enter fullscreen mode Exit fullscreen mode

3. Automatically switch off outside working hours

The production environment is probably used 24/7, but usually the development environment is only used during business hours, 1/3 of the time. By default, unless there is heavy activity during the day, the cost of the cluster will be about the same all the time, because RAM is consumed even without activity.

To drastically reduce the number of nodes in non-working hours, simply install the kube-downscaler in the cluster. The principle is simple: at the times indicated it reduces deployments and statefulsets to 0 pods, except in certain configurable namespaces. The drastic reduction in the number of pods will cause the Cluster Autoscaler to automatically delete unused nodes.

Another advantage: as the nodes have a lifespan of a few hours, the reserved disk space can usually be reduced from 100 GB to 20 GB, allowing a very slight additional saving.

4. Reduce the number of zones in the region

Inside AWS, network traffic within a zone is free, but charged between zones (between data centers).

There is an average of 3 zones per region. In practice, it is a surplus of high availability that is not necessary. If there are 2 unavailable areas at the same time, there is a good chance that the problem is larger and that the third is also unreachable…​

For structures of modest size, it is possible to reduce to 2 zones in development or even in production, depending on the necessary high availability. With substantial savings on network transfers.

5. Use the EC2s with the best performance / price ratio

Some information about EC2s (AWS VMs):

  • T2, T3, M4, M5, etc. Numbers denote generations. Generation 5 are generally more efficient in benchmarks

  • Tn instances have AWS Nitro technology, which is supposed to provide up to 60% performance at equal specs, but in practice the benchmarks are not that convincing

  • Tn instances work with credits. By default, consuming CPU over a long period of time increases the cost. And it is not possible to exceed 100% of the allocated CPU, it is indeed the prolonged use that is subject to credit.

Here is a price comparison in March 2021 for eu-west-3 (Paris), on instances that can contain 58 pods, with type, Spot Instances price and discount:

4 CPU / 32 Go RAM
r5a.xlarge    0,07$/h  (-74% on 0,27$/h)
r5ad.xlarge   0,07$/h  (-77% on 0.31$/h)
r5d.xlarge    0,07$/h  (-79% on 0.34$/h)
r5.xlarge     0,09$/h  (-70% on 0,30$/h)

8 CPU / 32 Go RAM
t3a.2xlarge   0,10$/h  (-71% on 0,34$/h)
t3.2xlarge    0,11$/h  (-71% on 0,38$/h)
m5a.2xlarge   0,13$/h  (-67% on 0,40$/h)
t2.2xlarge    0,13$/h  (-69% on 0,42$/h)
m5.2xlarge    0,13$/h  (-71% on 0,45$/h)
m5ad.2xlarge  0,13$/h  (-73% on 0,48$/h)
m5d.2xlarge   0,13$/h  (-75% on 0,53$/h)

8 CPU / 16 Go RAM
c5.2xlarge    0,12$/h  (-70% on 0,40$/h)
c5d.2xlarge   0,12$/h  (-74% on 0,46$/h)
Enter fullscreen mode Exit fullscreen mode

Some resulting observations:

  • All these Spot Instances are more or less the same price, except the r5, cheaper

  • CPU is a variable that significantly raises the bill

  • If CPU requirement is low (which is often the case in the development phase), it is better to use r5x

Note that if the infrastructure is provisioned by Terraform, changing the type of server is painless: applying the change will not remove the EC2s in place, it is the new ones that will have the new type.

6. Combine environments in the same cluster

If your software delivery is based on a flow involving multiple branches such as Gitflow, several levels of environment will be necessary: ​​possibly one environment per feature branch, one environment for the develop branch, one environment per release branch, and a production environment represented by the master branch.

If the delivery is more mature and organized in trunk-based-development, at least a production environment and a staging environment (pre-production / recipe / iso-production).

In either scenario, it is possible to organize to have only two clusters (dev & prod), or three (dev, staging & prod).

Grouping several environments in the same cluster shares the monitoring tools while separating the application into namespaces. On the data manager side (databases, messengers), it is better for them to be separated between environments. For instance, we can provision for each environment a managed DB outside the cluster for the most advanced environments (staging / production), and on the other hand integrate it into the Kubernetes cluster for the ephemeral environments of the feature branches. There are now Helm charts available for most DBs, which are easy to install and quick to instantiate.

A little warning though: It is an obvious source of Cloud economy, but if this aspect induces a daily waste of time due to centralization, it is ultimately a false good idea, human power being much more expensive than machine power nowadays.

7. Use Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) is a standard Kubernetes object allowing to automatically manage the number of pods (identical) of an application according to the actual activity (CPU, RAM or custom), between n and m pods.

It will therefore be more economical to define an HPA between 1 and 10 pods knowing that the maximum activity requires 10 pods, rather than defining the number of replicas systematically to 10. This will drastically reduce over-reservation on environments, especially outside production.

8. Use Vertical Pod Autoscaler

In some cases, horizontal pod scalability is not an option. Particularly for database statefulsets, search engine, messengers or cache. Why not just leave the pods with a low reservation, which would then consume resources at an excessively high limit? Because this puts an uncontrolled competition between the pods of the same node, and especially does not imply reorganization of pods.

Rather than defining CPU / RAM requirements corresponding to peak loads, it makes more sense to use the Vertical Pod Autoscaler. Thus no need to reserve important resources for each pod of our database, the increase and decrease in reservations will be done according to the activity. This will drastically reduce overbooking in environments, especially outside production.

Read about this in the excellent article Vertical Pod Autoscaling: The Definitive Guide.

9. Use a single load balancer with an Ingress Controller

When creating a service Kubernetes, the type can be ClusterIP, NodePort, LoadBalancer, or ExternalName. If it is of the LoadBalancer type, a device will be reserved to ensure load balancing.

To avoid this expensive (and luxurious) operation, it is better to define the services in ClusterIP, and to define ingress, managed by an ingress controller like Traefik, created by a former Zenika employee, or the one based on NGINX. It will provision a single load balancer for all ingresses, and therefore all kubernetes services.

10. Maintain about 5-6 nodes on average

The more nodes there are in a cluster, the more high availability is ensured, but the more system resources are consumed (incompressible or linked to daemonsets), the greater the chance of having insufficient resources to put the next pods ; available resources which, taken together, would be sufficient. The CPU will also be less shared, CPU which is more prone to peaks in consumption than memory.

The fewer nodes there are in a cluster, the more it is possible to avoid the previous problems. But the availability is lower, and a new node, if underutilized, represents a large percentage of loss.

The middle ground between high availability and resource efficiency would be, from experience, around 5-6 nodes. So with 12 nodes on EC2 xlarge, opt for EC2 2xlarge, which will let the Autoscaler adjust to 6 nodes, maybe 5 if the distribution of resources was unfavorable.


We have detailed 10 ways to shrink the AWS bill, without compromising on resiliency and availability, thanks mostly to different autoscaling mechanisms. To continue exploring the resiliency aspect on AWS, Sebastien Stormacq gave a great conference at Devoxx Belgium: Resiliency and Availability Patterns for the Cloud.

Discussion (0)

Forem Open with the Forem app