DEV Community

Cover image for How to reduce your Amazon EKS costs by half in 15 minutes
CAST AI
CAST AI

Posted on • Originally published at cast.ai

How to reduce your Amazon EKS costs by half in 15 minutes

Overprovisioning is the top reason why teams see their cloud bills constantly growing. But choosing the best instances from the hundreds of options AWS offers is a tough call.

How are you supposed to know which ones will deliver the performance you need?

Fortunately, you can find solutions on the market that can do that for you.

If you’re curious about how it all works, follow my journey and see how I slashed the costs of running my containers in EKS by half using CAST AI, in 15 minutes.


TL;DR

I started by provisioning an e-commerce app (here) on an EKS cluster with six m5 nodes (2 vCPU, 8 GiB) on AWS EKS. I then deployed an AI engine to analyze my application and suggest some optimizations. Finally, I activated the engine and watched the system self-optimize.

The initial cluster cost was $414 per month. Within 15 minutes, in a fully automated way, the cluster cost went to $207 (a 50% reduction), by reducing six nodes to three nodes. Then, 5 minutes later, the cluster cost went down to $138 per month, using spot instances (a 66% reduction).

 

Get your free CAST AI Savings Report to check how much you could potentially save. It’s the best starting point for any journey into cloud cost optimization.

Step 1: Deploying my app and running the Savings Report

 I deployed my app in 6 nodes on EKS. Here’s what it looked like before - all the nodes were empty:

The cluster was created via eksctl:

eksctl create cluster --name boutique-blog-lg -N 6 --instance-types m5.large --managed --region us-east-2

And after deployment - the green rectangles are the pods:

I’m using kube-ops-view, a useful open-source project to visualize the pods.

Notice that with Kubernetes, the application’s pods (aka containers) are spread evenly across all the nodes by default. Kubernetes is a fair orchestration engine. The CPUs range between 40 and 50%.

Note: all EKS autoscalings have been disabled on purpose, as CAST AI will substitute EKS Cluster Autoscaling.

Now it’s time to connect my EKS cluster to CAST AI. I created a free account on https://cast.ai and selected the Connect your cluster option.

Click on ‘Connect cluster’:

I copied and ran the script successfully on my terminal (I use Lens, another free tool for this).

laurent@laurents-MacBook-Pro ~ % curl -H "Authorization: Token  "https://api.cast.ai/v1/agent.yaml?provider=eks" | kubectl apply -f -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2585 0 2585 0 0 2391 0 --:--:-- 0:00:01 --:--:-- 2391
namespace/castai-agent created
serviceaccount/castai-agent created
clusterrole.rbac.authorization.k8s.io/castai-agent created
clusterrolebinding.rbac.authorization.k8s.io/castai-agent created
secret/castai-agent created
deployment.apps/castai-agent created

 The CAST AI agent went over my EKS cluster in read-only mode and generated this Savings Report:

If I switch my six m5.large to what CAST AI recommends - three c5a.large - I could slash my bill by almost 60%. Sounds like a plan!

With Spot Instances, I can get even higher savings (66.5%).

Step 2: Activating the cost optimization

To get started with cost optimization, I need to add my AWS access key ID and Secret access key to the platform.

 Getting the access keys was easy. All it took was running this script:

Step 3: Enabling policies

I turn on all the policies available in CAST AI:

  • CPU Policy: I tell the engine to never go above my budget. I set it up to 200 CPUs.
  • Node autoscaler: CAST AI will make a smart selection of nodes whenever I have unscheduled pods, starting with Spot Instances (if the unscheduled pods are Spot Instance-friendly) or On-Demand.
  • Node Deletion + Evictor: Evictor is a background process that continuously shrinks the cluster to the minimum number of nodes by bin-packing pods. Once a node becomes empty, it’s deleted automatically.

Evictor is a CAST AI tool that constantly looks for inefficiencies in a setup and helps to maximize savings as fast as possible. Run this command to activate Evictor:

So, I installed Evictor and set it to work. Evictor is running!


This is what Evictor in action looks like:

  1. One node (in red below) is identified as a candidate for eviction.
  2. Evictor automatically moves the pods to other nodes “bin-packing.”
  3. Once the node becomes empty, it’s deleted from the cluster.
  4. Go back to step 1.

One node is deleted:

Here are the Evictor logs:

time="2021-06-14T16:08:27Z" level=debug msg="will try to evict node \"ip-192-168-66-41.us-east-2.compute.internal\""
time="2021-06-14T16:08:27Z" level=debug msg="annotating (marking) node \"ip-192-168-66-41.us-east-2.compute.internal\" with \"evictor.cast.ai/evicting\"" node_name=ip-192-168-66-41.us-east-2.compute.internal
time="2021-06-14T16:08:27Z" level=debug msg="tainting node \"ip-192-168-66-41.us-east-2.compute.internal\" for eviction" node_name=ip-192-168-66-41.us-east-2.compute.internal
time="2021-06-14T16:08:27Z" level=debug msg="started evicting pods from a node" node_name=ip-192-168-66-41.us-east-2.compute.internal
time="2021-06-14T16:08:27Z" level=info msg="evicting 9 pods from node \"ip-192-168-66-41.us-east-2.compute.internal\"" node_name=ip-192-168-66-41.us-east-2.compute.internal
I0614 16:08:28.831083 1 request.go:655] Throttling request took 1.120968056s, request: GET:https://10.100.0.1:443/api/v1/namespaces/default/pods/shippingservice-7cd7c964-dl54q
time="2021-06-14T16:08:44Z" level=debug msg="finished node eviction" node_name=ip-192-168-66-41.us-east-2.compute.interna

And now the second and third nodes were evicted - 3 nodes remain:

After about 10 minutes, Evictor deleted 3 nodes and left 3 nodes running. Note that CPUs are now at a much healthier 80% rate.

The cost of this cluster is now $207.36 per month - half the initial cost of $414 per month.

I managed to achieve 80% of the projected savings. This is what I saw in my CAST AI dashboard: 

Advanced savings

Step 4 (optional): Moving my app to new optimized nodes

Steps 1, 2, and 3 are fully automated. CAST AI gradually shrinks the cluster by eliminating waste and overprovisioning. It does so by bin-packing pods and emptying nodes one by one. From that moment, the cluster is optimized and Evictor will continuously look for further optimization opportunities over time.

Step 4 is an advanced optional step where CAST AI actively replaces the current nodes with more optimized nodes, such as Spot Instances. The concept is fairly simple: CAST AI cordons the cluster, drains nodes, and replaces them with more optimized nodes.

The nodes are cordoned:

The first two nodes are drained, and the AI engine selects the most appropriate instances type for these nodes. This is what I saw in my CAST AI dashboard:

As you can see, my cluster now has only two nodes and costs $138 per month. It’s hard to imagine that I started out with a monthly EKS bill of $414.72!

Summary

Moving from a non-optimized setup to a fully-optimized one was a breeze. CAST AI analyzed my setup, found opportunities for savings, and swiftly optimized my cluster in 15 minutes. I cut my EKS bill by half in 15 minutes, from $414 to $207.

Then, I activated advanced savings by asking CAST AI to replace nodes with more optimized nodes and achieved further savings, ending up with a $138 bill.

Run the free CAST AI Savings Report to check how much you could potentially save. It’s the best starting point for any journey into cloud cost optimization.

Top comments (9)

Collapse
 
bobrossthedude profile image
bobrossthedude

Is that agent in read-only mode now or can it start optimizing my clusters?

Collapse
 
castai profile image
CAST AI

As soon as you are ready - click Start saving and our AI will optimize your cluster.

Collapse
 
timothystewarttech profile image
timothystewarttech

I think its GKE that can only read your cluster for now, EKS agent did allow me to perform the displayed savings actions when I tested it

Collapse
 
castai profile image
CAST AI

You are right @timothystewarttech GKE optimizer will be available in a few weeks, but you could already analyze your cluster get the report about potential savings on GCP.

Collapse
 
zechariah66 profile image
Zechariah Conn

The Brussels question has taken so sharp amazing making turns in the road. Visit grammarcheckeronline.org/3-facts-y... for additional subtleties. Finally the Jury of IGC should make an additional standard and sensible perspective.

Collapse
 
zechariah66 profile image
Zechariah Conn

The most likely improvements of music making game framework for youngsters are beast and various watchmen explore this reality. Essentially view copyright paper checker I need to recognize that you will like it. In any occasion astounding part is that there is just six percent of understudies in the whole United States of America that are turned the music sources.

Collapse
 
zechariah66 profile image
Zechariah Conn

This base can be overseen considering the course that there are encountering minor mutilations in it. Visit sentencecorrection.org/use-help-of... to make work on time. Absolving whatever else, the proprietor of this base is drawn nearer to put all sources inside concerning the dividers.

Collapse
 
andreidascalu profile image
Andrei Dascalu

Pretty nice tool. I'll give it a shot soon! I wonder how it plays with gitops setups as well.

Collapse
 
castai profile image
CAST AI

We use gitOps our selves so everything should be fine. If any questions - feel free to join our community on slack