The art of guesstimating

#aws #kubernetes #cloud #architecture

⚠️ Disclaimer: The following post may contain biased opinions.

At some point in your career, you may be required to provide an estimate related either to cost or capacity, without having the complete view of the solution's architecture and/or infrastructure.

My simple approach to guestimates relies on the four elements:

1) Do not be afraid to make assumptions...or educated guesses
Rarely is the sweet spot (costs vs. capacity) hit; as a result, architectures tend to be overprovisioned (most of the time) or underprovisioned (typically reveals itself through system instability).

2) Pick your battle
Costs are driven by 3 factors: storage, networking (ingress is often free or at very low cost), and compute. Choose one and try to address it as best as you can.

3) Scaling is the solution
Whether using HPA, VPA, or Cluster Autoscaler, the objective is to scale down when resource utilization drops.

4) Follow the money
There’s often a tendency to overcommit (exceed allocatable resources), resulting in excessive reservations, underutilized capacity, and ultimately, cost overruns. Make use of tools whenever necessary:

Goldilocks: Provides recommendations for workload's resource requests and limits.

OpenCost: Vendor-neutral project that helps reduce cost overruns by monitoring cloud infrastructure and container costs in real time.

Karpenter: Open-source node lifecycle management project that automates provisioning and deprovisioning of nodes based on the specific scheduling needs of pods.

Keda: Scaling workloads based on events (message queues, databases, or APIs).

KubeGreen: Simple Kubernetes addon that automatically shuts down (some of) your resources when you don't need them.

💡 As a final thought, remember:

Overcommitment is not a bad thing in itself, but it works on the assumption that not all the pods will claim all of their usable resources at the same time.
Horizontal scaling is best suited for stateless workloads, and vertical scaling for stateful workloads.
HPA requires at least 1 Pod to be running at all times, so that it can collect the metrics used to inform future scale-up decisions.
Memory resource units Mi is mebibytes and M is megabytes (computers use the binary system, therefore Mi usage is preferred).
CPU limits are enforced by CPU throttling, and memory limits are enforced by the kernel with OOM kills.
Allocatable resources hold greater significance than capacity when it comes to workload placement.

DEV Community

The art of guesstimating

Top comments (0)