Hey Everyone! let's talk about something that I know all of us care about. That is saving money on our cloud bills. I recently dived deep into optimizing our container costs on AWS, and to be honest, I wish I knew all of these insights earlier.
Why Container Cost Optimization Matters
The thing is containers are a huge win on the scaling and deployment side, but they can quietly carry away at your budget if you’re not keeping an eye on it. The beast? The beauty is AWS offers us a whole bunch of ways to make the cost magic disappear without impacting performance or even improving it, quite often.
Spot Instances: Your Secret Weapon
This is in all probability, the biggest win that AWS has provided to me. You can save up to 90% in spot instances over on-demand instances. Yes, 90%! These are ideal for fault-tolerant applications that are able to withstand intermittent disruptions.
EKS makes the process of using managed node groups with spot instances relatively easy. You can even use both spot instances and regular instances,within the same EKS cluster. In this way, you could use regular instances to power critical applications, while another application could use spot instances.
To begin, moving our batch-processing workloads and our CI/CD pipelines to Spot instances worked beautifully for us. These workloads are inherently interruptible, so the cost benefit is instantaneous. The key, though, is to ensure your applications can shut down gracefully, and you're good to go.
Fargate vs EC2: Choosing Wisely
But I did have questions about when to pick Fargate over EC2 and when to pick EC2 over Fargate. Fargate costs more per compute unit, but it does away with operational costs.
I learned the following: Fargate is great for workloads that have unpredictable traffic or a smaller size, or if you want no infrastructure management. The service charges are for what you use down to the second with no wasted capacity.
EC2 would be more appropriate for production workloads where right-sizing and maintaining high resource utilization are possible. For large application deployments that exhibit predictable resource behaviour, EC2 using Reserved Instances and/or Savings Plans tends to be cheaper.
As for my current approach,
I am using Fargate for dev environments and occasional workloads. Production workloads that receive steady traffic run on well-optimized instances of EC2.
It’s essentially about using the right tool for the right job.
Autoscaling: The Dynamic Duo
Two things that changed the way I think about resource allocation are Cluster Autoscaler and Horizontal Pod Autoscaler (HPA).
The Cluster Autoscaler automatically scales the number of nodes in your cluster based on how many pending pods there are. No more spending money on nodes just sitting there, doing nothing.
HPA scales your pods at an application level based on CPU, memory, or your customized metrics. All three are a magnificent symphony of efficiency. Your cluster scales up as your traffic boosts and down when your traffic diminishes. Doing so by itself saved us 30% because we stopped over-provisioning just in case.
Setting up HPA is straightforward:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Rightsizing: Stop Wasting Resources
I was guilty of this too: setting pod resource requests much too high as a precaution. But no, I was actually paying for resources we didn't use.
Start with understanding your actual resource consumption. Evaluate what your pods actually consume resource-wise and not what you believe they consume. You can use the Kubernetes metrics server to achieve this.
Next, adjust your resource requests and limits based on that. For example, if your pod is using 100MB of memory but you have a resource request of 512MB, you are wasting money. Be practical about your application resource requirements, and remember that resource limits are important too.
Practice tip: Begin conservatively, watching for a week or two, and then optimize. It may seem slow, but the cost savings really add up over dozens or hundreds of pods.
Kubecost: Your Financial Visibility Partner
His tool was nothing short of revolutionary for me. Kubecost is a cost visibility tool for Kubernetes workloads that delivers real-time cost visibility. What it does is show you exactly where your dollars are being spent – down to the namespace level or down to pods.
What I like best about Kubecost is the way it provides cost data broken down by teams, apps, or environments. Now you can see that the actual cost of the staging environment is up to the cost of the production environment oops, or that one microservice is consuming 40% of the compute cost.
The community edition offers a lot and is ideal for beginners. Once you install it on your cluster, you get information regarding cost allocation, optimization, and even alerts for spending above certain limits. It’s almost like having your own financial analyst for all of your Kubernetes clusters.
ECR Lifecycle Policies: Clean Up and Save
This is something to easily forget: those images inside those container repositories in your Amazon ECR? They're going to start costing money to store. Just because those versions are ancient and sitting around unused means they're effectively burning money.
ECR lifecycle policies allow you to automatically clean up your images by age or number. One of the first things I did was establish a basic policy of retaining the last 10 images in a repository and removing anything over 30 days if not pulled.
{
"rules": [
{
"rulePriority": 1,
"description": "Keep last 10 images",
"selection": {
"tagStatus": "any",
"countType": "imageCountMoreThan",
"countNumber": 10
},
"action": {
"type": "expire"
}
}
]
}
It’s a small thing, but if you’re working with several repositories, the disk space savings can add up.
Bringing It All Together
Cost optimization, however, doesn't fall under the category of something that's done in one go. My take on this would be to begin with the quick wins. The.quick wins include autoscaling, Spot instances, and Kubecost.
Next, begin to focus on rightsizing, cleanup of old images, and making educated decisions between Fargate and EC2. Monitor progress and rejoice at the wins along the way. We were able to lower our costs in containers by close to 45% in three months.
The Next Areas to Tackle: Infrastructure and IRSA
Also, remember that every dollar you save is a dollar you can then invest in building better features or improving your infrastructure in different ways.
What cost optimization methods have you tried with success? I'd love to hear your experiences so I don't miss any tips. Thanks in advance, and let's keep learning from each other!
Share your “best tip on cutting costs” in the comment section below to help each other stay on top of those cloud bills!

Top comments (0)