Latchu@DevOps

Posted on Oct 8

Part-115: 🚀Google Kubernetes Engine (GKE) — Horizontal Pod Autoscaling (HPA)

#kubernetes #googlecloud #devops #gcp

Scaling is one of the core features of Kubernetes — and Horizontal Pod Autoscaler (HPA) makes it effortless by automatically adjusting the number of Pods based on resource usage or custom metrics.

In this article, we’ll understand how HPA works in Google Kubernetes Engine (GKE), what the Metrics Server does, and how to visualize the autoscaling flow with an easy-to-understand diagram.

⚙️ What is Kubernetes Horizontal Pod Autoscaling?

The Horizontal Pod Autoscaler (HPA) automatically increases or decreases the number of Pods in your Kubernetes workloads based on observed metrics such as CPU and memory usage.

🔄 It can scale your applications using:

CPU and Memory Utilization (default metrics)
Custom Metrics from within your Kubernetes cluster
External Metrics (like Cloud Pub/Sub messages, HTTP requests, or load balancer metrics)
Managed Service for Prometheus (for advanced custom metrics)

🧩 HPA Works With These Workload Types:

ReplicaSet
ReplicationController
Deployment
StatefulSet

💡 It ensures your app can:

Scale out automatically to handle high demand
Scale in when demand drops — freeing cluster resources and saving cost

🧠 How Horizontal Pod Autoscaler Works — Explained with the Diagram

Below is a simple visual representation of how HPA functions inside a GKE Cluster:

Let’s break it down 👇

🔹 Step 1: Query for Metrics

Inside your GKE cluster, the Metrics Server (running in the kube-system namespace) continuously collects metrics like CPU and memory usage from all Pods through Kubelets.

These metrics are then made available through the Metrics API, which HPA uses to make scaling decisions.

🕒 This process runs in a control loop — every 15 seconds.

🔹 Step 2: Calculate the Required Replicas

The Horizontal Pod Autoscaler (HPA) controller compares the actual resource usage of your Pods (from the Metrics API) against the target utilization you defined.

For example:
If you set CPU target to 50%, and your Pods are using 90%, HPA will calculate how many additional Pods are needed to balance the load.

🔹 Step 3: Scale the Deployment (MyApp1)

After calculating the new desired number of Pods, HPA automatically scales the corresponding Deployment, ReplicaSet, or StatefulSet.

This means:

If demand increases → more Pods are created.
If demand decreases → Pods are terminated automatically.

✅ The scaling happens smoothly without any downtime.

🧮 Example: Imperative Command

You can create an autoscaler using a simple kubectl command:

kubectl autoscale deployment my-app --min=4 --max=6 --cpu-percent=50

Explanation:

my-app: Target Deployment name
--min=4: Minimum of 4 Pods
--max=6: Maximum of 6 Pods
--cpu-percent=50: Target CPU utilization threshold

When the average CPU utilization of the Pods exceeds 50%, HPA increases the number of Pods (up to 6). When it’s below 50%, it scales down (not below 4).

📊 Kubernetes Metrics Server

The Metrics Server is the key component that enables autoscaling in Kubernetes.

🔸 What it Does:

Collects resource metrics (CPU, memory) from Kubelets
Exposes them via Metrics API
Provides data for commands like:

kubectl top nodes
kubectl top pods

🔸 Key Points:

Collects metrics every 15 seconds
Lightweight — uses about 1 millicore CPU and 2 MB memory per node
Optimized for autoscaling, not for monitoring dashboards
Do not use it as a monitoring solution (use Prometheus or Cloud Monitoring for that)
Also supports Vertical Pod Autoscaler recommendations

💡 Benefits of HPA

✅ Automatic Scaling — Kubernetes adjusts Pod count dynamically
✅ Cost Efficiency — Uses only the resources you need
✅ Performance Stability — Keeps workloads responsive during spikes
✅ Less Manual Work — No need to scale deployments manually

🧠 Quick Recap

Component	Purpose
Metrics Server	Collects Pod resource usage
HPA Controller	Calculates replicas every 15 seconds
Deployment / StatefulSet	Scaled automatically by HPA
kubectl top	Shows live metrics for debugging
Target Metrics	CPU %, Memory %, or custom metrics

🏁 Final Thoughts

The Horizontal Pod Autoscaler (HPA) in GKE makes your applications elastic — scaling up when demand increases and scaling down to save cost when demand drops.

It’s one of the most powerful automation tools in Kubernetes, ensuring your workloads stay efficient, responsive, and cost-optimized without manual effort.

✨ Example Use Cases

Autoscaling web applications during traffic spikes
Scaling data-processing jobs when queue length increases
Optimizing microservices automatically based on load

🌟 Thanks for reading! If this post added value, a like ❤️, follow, or share would encourage me to keep creating more content.

— Latchu | Senior DevOps & Cloud Engineer

☁️ AWS | GCP | ☸️ Kubernetes | 🔐 Security | ⚡ Automation
📌 Sharing hands-on guides, best practices & real-world cloud solutions

DEV Community