Gahl Saraf

Posted on Jul 19, 2023 • Originally published at raftt.io

Reducing Cloud Costs on Kubernetes Dev Envs

#kubernetes #devops #aws

At Raftt, we’ve gone to great lengths to reduce the cloud costs for environments running on our development cluster, both for our internal usage and for our customers. In this blog post I’ll cover all the adaptations we made, so you can apply them to your own infrastructure. For each, I’ll add the approximate savings, and in the end I’ve attached a summary.

We primarily use AWS and EKS for our managed cloud clusters, so there will be certain parts of the post that will be a bit more relevant if you are using those (though the concepts carry over to any other cloud provider). We’ve covered the most significant factors, and through these were able to save over 95%, but there is lots more you can do. For instance - reduce persistent volume sizes or remove some of CloudWatch logging. I’ll start with infrastructure-level modifications - sharing clusters, autoscaling, right-sizing nodes, and using spot instances:

Infrastructure Adaptations

Shared clusters

This one may be a bit obvious, but it has to be said - while our production runs on its own Kubernetes cluster, we do not want a cluster for each preview or dev environment. This is because:

It takes a long time to create Kubernetes clusters. In EKS, this is 10-15 minutes, including the time it takes for nodes to spin up. Other distributions are faster, but it still slows down development.
There are a lot of static costs associated with each cluster. For our EKS setup, this ends up being around $100 / month, including the EKS backend (~$70), an NLB load balancer (~$21), some CloudWatch logs (~$10).
Different clusters cannot share the same nodes, so resource sharing is impossible, and we end up spending much more on EC2 instances.

Instead, we will create a single long-lived cluster, and deploy our application in different namespaces. There are a bunch of ways to do that - see ArgoCD, Flux, custom internal tooling, or other solutions (we use our own product). That way, we:

Only setup the cluster and infra once, and only incur the costs once.
Are able to share the underlying resources (more on that below).

Autoscaling

The most significant single factor in the cost of most Kubernetes clusters is the compute powering the cluster’s nodes. Several factors affect their cost, and the first we will cover is autoscaling. In cloud-based clusters with different levels of utilization, autoscaling is a must. It can reduce costs by an order of magnitude. Specifically for a cluster used for development purposes, and assuming we have infrastructure that can bring up and scale down environments as needed, this means cloud instances can be taken down (automatically):

Over the weekends, saving 48 hours a week
Outside of working hours, saving 14 hours a day for working days (another 70 hours a week)
Holidays, and other off days - another 15 days a year
Since we are talking about environments used for developments, we can also scale down on days where people are on personal leave - another 20ish days a year.

All told, we get to (365-(52*2)-15-20)=226 working days per engineer, and with 10ish hours of work per day - around 2260 hours, or around 2260/(365*24)=1/4 of the total yearly time.

Autoscaling over EKS can be accomplished using either the cluster-autoscaler project or Karpenter. If you want to use Spot instances, consider using Karpenter, as it has better integrations with AWS for optimizing spot pricing and availability, minimizing interruptions, and falling back to on-demand nodes if no spot instances are available.

Node selection

Autoscaling applies on the node types we have chosen, and there are several important factors to consider when choosing them:

The most important consideration is the node size. Kubernetes works well with clusters that have many smaller nodes, rather than a few large ones. This provides finer-grained autoscaling and reduces the impact of single nodes becoming unavailable. However, there is a set overhead per node consisting of daemons running on the VM itself (Kubelet, containerd, …), and DaemonSet pods that run on each node. Make sure to take those into account, and choose a node such that those services won’t be a significant percentage of its compute. We recommend working with nodes in the [cm].large-2xlarge range.

It is possible to further optimize your node size by creating multiple node groups. if your workloads have diverse resource requirements (some need high memory while others need high CPU), this might be a worthwhile optimization to allow flexibility and maximum resource utilization. For example, if you have a workload that requests 8 GiB memory and 1 vCPU, it may make sense to utilize r-type (high memory) instances. That said, for smaller clusters (less than 10 nodes), this can be a hassle to maintain.

When using a rare node type (for example, X1 super high memory instances), you may occasionally run into problems when scaling up. This is especially true if you are limited in your availability zones, or trying to use Spot instances. Our solution was to specify a wide range of similar instances, and allow Karpenter to choose between them. For different clusters we use the c or m instance families, and provide Karpenter with a list such as: c5a, c5, c5ad, c5d, c5n, c6a, c6i, c6id, and c6in.

Finally, since dev environments are bursty and can tolerate cpu disruption, you can choose to use the burstable node types (in AWS - t3, t4), and save some more. At time of writing, in AWS in Frankfurt, a t3a.2xlarge costs $252 per month, while a m5a.2xlarge costs $303, so a 17% discount. This is not super significant, and has some actual downsides, so we will not assume you’ve made this jump.

Spot Instances

One of the best ways to save money on cloud instances is to use them through the Spot or interruptible program through your cloud provider. This can save anywhere from 45%-85% of the cost, without requiring any commitment.

Not all workloads perform well on spot instances, since the chance of a node needing to be replaced is significant higher. It is very worthwhile to check though, since the savings are so significant.

We use Karpenter with a Provisioner that looks like this:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
 &nbsp;name: default
spec:
 &nbsp;requirements:
 &nbsp; &nbsp;- key: node.kubernetes.io/instance-type
 &nbsp; &nbsp; &nbsp;operator: In
 &nbsp; &nbsp; &nbsp;values: # List of node types as described above
 &nbsp; &nbsp;- key: karpenter.sh/capacity-type
 &nbsp; &nbsp; &nbsp;operator: In
 &nbsp; &nbsp; &nbsp;values:
 &nbsp; &nbsp; &nbsp; &nbsp;- "spot"
 &nbsp; &nbsp; &nbsp; &nbsp;- "on-demand"

Which allows Karpenter to choose the most ideal instances for us, and if none are available as Spot instances, use on-demand. Since Karpenter itself does not like to run on the instances it manages, we spin up a small t3.medium instance for Karpenter and a select few other services that don’t tolerate interruptions well enough.

We have seen an average reduction in EC2 cost of about 60%.

Application Adaptations

When using the same Infrastructure as Code (IaC) definitions for production, staging, preview and dev (highly recommended!) it is important to remember to modify them for use for dev environments. There are two main differences between production and dev that are relevant for our purposes:

That dev environments are mostly idle
That we don’t necessarily need the entire environment
That we don’t need to be as resilient

In the following sections we’ll discuss applying these assumptions to the number of replicas, the resource requests and the environment

Workload Replicas

Since we don’t need to be resilient, and the environment is expected to be mostly idle, we can scale down deployments to a single replica. This is true both for regular deployments, but also for stateful sets, where the scale down translates to storage savings as well.

Note that for stateful sets in particular, there may be business logic that might need to change to handle the different replica count, or edge cases that won’t replicate with fewer replicas. Think things like consistent routing of requests to the same stateful services.

Workload Resource Requests

The main difference is that dev environments are idle 99% of the time. In our case, for one of our services, we are seeing the pods use 1-5 mCPU and 50MiB memory, down from abut 100 milli CPU and 160 MiB memory. So around 1/20th the CPU and 1/3 the memory. In this case, we will be able to pack 3 times as many dev environments if we modify the resource requests accordingly.

One of the difficulties with developer environments is that their usage can be “bursty” - idle 99% of the time, but suddenly busy for a short while while the developer is working with them. These spikes can cause an increase in CPU and in memory. Because CPU is a compressible resource, nothing significantly bad will happen if there is some contention, though things may run a bit more slowly. Memory, however, is a different matter. If your application takes a lot more memory when in use, you may need to increase the requests to match, or be susceptible to OOMs and evictions.

Kubernetes is reasonably good at dispersing pods across nodes, and one thing that could happen is that some nodes will end up with the “burstier” pods, while others are quiet. You could try to bind the entire environment to a single node so all nodes behave similarly, though that adds more complexity to the deployment.

Partial environments

A final possibility for resource reduction is to intelligently choose subsets of environments to bring up for dev purposes. For example, instead of bringing up all 20 microservices, you could choose a part of the environment to bring up. There are two main ways to accomplish that:

Automatically Identify which code was changed in the branch, to which services it belongs, and bring up those + their dependencies. Unfortunately, this is very hard to do well, and mistakes mean broken environments.
Pre-define several environment subsets, and bring those up depending on certain criteria or developer request. This is easier, but requires maintenance.

From our experience, investing in partial environments only makes sense if it is easy to do for your system, and you have a relatively large number of services - let’s say more than 20. If you do go this route, it should be possible to save an additional 30-60% of the resources in the environment.

Putting it all together

We’ve gone through 6 complimentary strategies for reducing the cloud resource cost of running developer environments. Together, they can drastically reduce your developer-related cloud bill. Let’s bring this down to actual money. Let’s say we have a project that has 15 microservices, with 3 replicas in production, and an overall resource utilization in production of 80GiB memory and 20 CPUs. This costs us (assuming AWS in Frankfurt):

	Count	Cost	Subtotal
EKS Cluster	1	$70	$70
Load Balancer	1	$21	$21
CloudWatch	1	$10	$10
Static management node (t3.medium)	1	$35	$35
Main EC2 Nodes (m5a.2xlarge)	3	$303	$909
Total			$1045

Our comparison state is one where we deploy a full copy of our infrastructure for each environment. If you already have some of this implemented, take a look at the difference. Lets take a look at the savings we get from applying all the above techniques.

Shared cluster - so we have 0 incremental costs for each environment. Assuming 10 environments on average using the cluster, let’s amortize the cost - (70+21+10+35)/10 = $13.6
Autoscaling - Instead of keeping our 3 main nodes up all the time, we enable autoscaling and reduce their cost to (3*303)/4 = $227.25
Node selection - as discussed in that section, while we could save another 17% and switch to t3 instances, we’ll skip this optimization for now.
Spot instances - implementing spot instances, and using the up to date pricing, we reduce our cost per node to $134.46, and taking into account the autoscaling and node count, a total of (3*134.46)/4 = $100.845
Workload replicas - reducing our replica count to 1 cuts our resource usage by 1/3, and leaves us with 1 instead of 3 nodes. At this point it may make sense to choose smaller nodes, but since pricing is generally linear with node resources, let’s ignore that. Our updated price for the EC2 instances is 134.46/4 = $33.6
Workload resource requests - cutting our resource requests to around one-third translates directly to fewer EC2 instances, so - 33.6/3 = $11.2
Partial environments - if we can easily cut our environment and bring up only what is necessary, we can save another ~50%, and get to 11.2/2 = $5.1

To sum up, we have $13.6 static costs and $5.1 usage costs per env. We’ve gone down more than 50x, from $1045 for our production to $18.7 for each dev env. And since our largest portion is the cluster static costs, we scale really well - increasing this to 20 environments means the price per env drops to $11.9.

This is, of course, a rough approximation, and doesn’t count things like EBS volumes, increased CloudWatch usage or less-than-optimal node utilization. But even if we are off by 20%, we still end up over 40x cheaper.

Sounds great, right? What do I need to do to make this happen?

Make sure you have some declarative way to define the infrastructure - Terraform, CloudFormation, CDK, Pulumi - whatever. This is crucial because making incremental improvements is significantly easier if you can see the exact effects of the changes you are making and roll back as needed.
Next, adopt some kind of environment orchestration solution - while we don’t use Argo or Flux internally, I’ve heard great things about either of them.
Once you have a working setup, start with the infrastructure changes, and adopt them one by one - autoscaling, right-sizing nodes, and switching to spot instances.
Next, modify your application deployment (through Helm / Kustomize / whatever you are using) to reduce replicas and resource requests, and create partial configurations if possible.
Finally, wait a day and go to your AWS Cost Explorer page and see how far you got 🙂.

To actually be able to use these environments for development, you will want a solution that gives developers access to these environments, and provides fast iterations, hot reloading and debugging. For that, (or if all of this seems like a lot of work), let’s get in touch 😉.

Top comments (3)

amossh • Jul 19 '23

To my experience, a significant chunk of the cost optimization comes down to good scaling down strategy, but such a strategy isn't always easy to define.
How do you decide when to scale down envs?

Gahl Saraf • Jul 19 '23

That's a great question, and one I plan to address in a future blog post :)

There are a bunch of options-

Simple TTL
Metrics from the ingress
Upon PR close

We use our own product to accomplish this, and it takes a look at actual dev use of the environment (files changing, traffic over port forwards, API calls, etc.)

Ravi Kyada • Jul 28 '23

Awesome Content on Cost savings. Spot Instances and Partial Deployment of Environments with shared clusters are the best way to save Cost in the Todays Cloud-native Infrastructures.

DEV Community