DEV Community

Cover image for Understanding Auto Scaling in Kubernetes
Jensen Jose
Jensen Jose

Posted on

Understanding Auto Scaling in Kubernetes

Welcome back to the CK2024 blog series! I'm excited to dive into the concept of auto-scaling in Kubernetes which is a crucial aspect of managing Kubernetes clusters efficiently, especially for beginners and those looking to deepen their understanding.

What is Scaling?

Scaling refers to adjusting your servers or workloads to meet demand. This adjustment can be done manually or automatically. Scaling ensures that your applications can handle increased traffic or resource utilization without manual intervention.

In Kubernetes, we often talk about scaling in terms of Deployments and ReplicaSets. Deployments allow us to manage multiple replicas of a single pod, ensuring that our applications can handle varying loads.

Manual vs. Automatic Scaling

In a traditional setup, scaling might involve manually updating the number of replicas in a Deployment or ReplicaSet. This approach can be inefficient and impractical for large-scale applications running in production environments. Automatic scaling, on the other hand, adjusts the number of pods based on current demand and resource utilization, ensuring optimal performance and resource usage.

Types of Auto Scaling in Kubernetes

  1. Horizontal Pod Autoscaling (HPA)
    Horizontal Pod Autoscaling automatically adds or removes pod replicas based on CPU and memory utilization. For example, if the average CPU utilization exceeds a specified threshold, HPA will add more pods to handle the increased load.

  2. Vertical Pod Autoscaling (VPA)
    Vertical Pod Autoscaling adjusts the resource requests and limits of a pod, effectively resizing it to meet the demand. This approach can result in pod restarts, so it's suitable for non-mission-critical applications that can tolerate downtime.

Practical Example: Horizontal Pod Autoscaling

Let's walk through a practical example to illustrate how HPA works.

  1. Prerequisites: Ensure that the metrics server is running in your cluster. The metrics server provides the necessary metrics for HPA to make scaling decisions.
kubectl get pods -n kube-system
# Ensure metrics-server is running
Enter fullscreen mode Exit fullscreen mode
  1. Create Deployment and Service: We'll create a deployment and expose it via a service.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      app: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        app: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 200m
          limits:
            cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
spec:
  selector:
    app: php-apache
  ports:
  - port: 80
    targetPort: 80
Enter fullscreen mode Exit fullscreen mode

Apply the YAML file:

kubectl apply -f deployment.yaml
Enter fullscreen mode Exit fullscreen mode
  1. Create HPA: Now, we create an HPA object to scale our deployment based on CPU utilization.
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
Enter fullscreen mode Exit fullscreen mode
  1. Generate Load: To see HPA in action, we'll generate load on the deployment.
kubectl run -i --tty load-generator --image=busybox /bin/sh
# Inside the pod, run the following command to generate load
while true; do wget -q -O- http://php-apache; done
Enter fullscreen mode Exit fullscreen mode
  1. Monitor HPA: Monitor the HPA to see how it scales the deployment.
kubectl get hpa -w
Enter fullscreen mode Exit fullscreen mode

As the load increases, HPA will add more replicas to handle the demand. Once the load decreases, HPA will scale down the replicas to the minimum specified.

Conclusion

Understanding and implementing auto-scaling is essential for managing Kubernetes clusters efficiently. Horizontal and vertical scaling ensures that your applications can handle varying loads while optimizing resource usage. While HPA is built into Kubernetes, VPA and other advanced scaling features may require additional setup or managed cloud services.

In the next post, we'll explore liveness and readiness probes in Kubernetes, which are crucial for ensuring that your applications are running smoothly and are available to serve requests. Happy learning!

For further reference, check out the detailed YouTube video here:

Top comments (0)