DEV Community

Cover image for Optimizing Costs for Container Workloads on AWS EKS and ECS
Sameer Khanal
Sameer Khanal

Posted on

Optimizing Costs for Container Workloads on AWS EKS and ECS

Hey Everyone! let's talk about something that I know all of us care about. That is saving money on our cloud bills. I recently dived deep into optimizing our container costs on AWS, and to be honest, I wish I knew all of these insights earlier.

Why Container Cost Optimization Matters

The thing is containers are a huge win on the scaling and deployment side, but they can quietly carry away at your budget if you’re not keeping an eye on it. The beast? The beauty is AWS offers us a whole bunch of ways to make the cost magic disappear without impacting performance or even improving it, quite often.

Spot Instances: Your Secret Weapon

This is in all probability, the biggest win that AWS has provided to me. You can save up to 90% in spot instances over on-demand instances. Yes, 90%! These are ideal for fault-tolerant applications that are able to withstand intermittent disruptions.

EKS makes the process of using managed node groups with spot instances relatively easy. You can even use both spot instances and regular instances,within the same EKS cluster. In this way, you could use regular instances to power critical applications, while another application could use spot instances.

To begin, moving our batch-processing workloads and our CI/CD pipelines to Spot instances worked beautifully for us. These workloads are inherently interruptible, so the cost benefit is instantaneous. The key, though, is to ensure your applications can shut down gracefully, and you're good to go.

Fargate vs EC2: Choosing Wisely

But I did have questions about when to pick Fargate over EC2 and when to pick EC2 over Fargate. Fargate costs more per compute unit, but it does away with operational costs.

I learned the following: Fargate is great for workloads that have unpredictable traffic or a smaller size, or if you want no infrastructure management. The service charges are for what you use down to the second with no wasted capacity.

EC2 would be more appropriate for production workloads where right-sizing and maintaining high resource utilization are possible. For large application deployments that exhibit predictable resource behaviour, EC2 using Reserved Instances and/or Savings Plans tends to be cheaper.

As for my current approach,
I am using Fargate for dev environments and occasional workloads. Production workloads that receive steady traffic run on well-optimized instances of EC2.
It’s essentially about using the right tool for the right job.

Autoscaling: The Dynamic Duo

Two things that changed the way I think about resource allocation are Cluster Autoscaler and Horizontal Pod Autoscaler (HPA).

The Cluster Autoscaler automatically scales the number of nodes in your cluster based on how many pending pods there are. No more spending money on nodes just sitting there, doing nothing.

HPA scales your pods at an application level based on CPU, memory, or your customized metrics. All three are a magnificent symphony of efficiency. Your cluster scales up as your traffic boosts and down when your traffic diminishes. Doing so by itself saved us 30% because we stopped over-provisioning just in case.

Setting up HPA is straightforward:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
Enter fullscreen mode Exit fullscreen mode

Rightsizing: Stop Wasting Resources

I was guilty of this too: setting pod resource requests much too high as a precaution. But no, I was actually paying for resources we didn't use.

Start with understanding your actual resource consumption. Evaluate what your pods actually consume resource-wise and not what you believe they consume. You can use the Kubernetes metrics server to achieve this.

Next, adjust your resource requests and limits based on that. For example, if your pod is using 100MB of memory but you have a resource request of 512MB, you are wasting money. Be practical about your application resource requirements, and remember that resource limits are important too.

Practice tip: Begin conservatively, watching for a week or two, and then optimize. It may seem slow, but the cost savings really add up over dozens or hundreds of pods.

Kubecost: Your Financial Visibility Partner

His tool was nothing short of revolutionary for me. Kubecost is a cost visibility tool for Kubernetes workloads that delivers real-time cost visibility. What it does is show you exactly where your dollars are being spent – down to the namespace level or down to pods.

What I like best about Kubecost is the way it provides cost data broken down by teams, apps, or environments. Now you can see that the actual cost of the staging environment is up to the cost of the production environment oops, or that one microservice is consuming 40% of the compute cost.

The community edition offers a lot and is ideal for beginners. Once you install it on your cluster, you get information regarding cost allocation, optimization, and even alerts for spending above certain limits. It’s almost like having your own financial analyst for all of your Kubernetes clusters.

ECR Lifecycle Policies: Clean Up and Save

This is something to easily forget: those images inside those container repositories in your Amazon ECR? They're going to start costing money to store. Just because those versions are ancient and sitting around unused means they're effectively burning money.

ECR lifecycle policies allow you to automatically clean up your images by age or number. One of the first things I did was establish a basic policy of retaining the last 10 images in a repository and removing anything over 30 days if not pulled.

{
  "rules": [
    {
      "rulePriority": 1,
      "description": "Keep last 10 images",
      "selection": {
        "tagStatus": "any",
        "countType": "imageCountMoreThan",
        "countNumber": 10
      },
      "action": {
        "type": "expire"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

It’s a small thing, but if you’re working with several repositories, the disk space savings can add up.

Cost Optimizition

Bringing It All Together

Cost optimization, however, doesn't fall under the category of something that's done in one go. My take on this would be to begin with the quick wins. The.quick wins include autoscaling, Spot instances, and Kubecost.

Next, begin to focus on rightsizing, cleanup of old images, and making educated decisions between Fargate and EC2. Monitor progress and rejoice at the wins along the way. We were able to lower our costs in containers by close to 45% in three months.
The Next Areas to Tackle: Infrastructure and IRSA

Also, remember that every dollar you save is a dollar you can then invest in building better features or improving your infrastructure in different ways.

What cost optimization methods have you tried with success? I'd love to hear your experiences so I don't miss any tips. Thanks in advance, and let's keep learning from each other!

Share your “best tip on cutting costs” in the comment section below to help each other stay on top of those cloud bills!

Top comments (0)