In this guide, we’ll walk through how to implement and test Horizontal Pod Autoscaling (HPA) on a GKE cluster.
You’ll learn how to automatically scale your Pods based on CPU utilization — helping your applications handle increased load efficiently and reduce costs during idle time.
📘 Step 01: Introduction
In Kubernetes, the Horizontal Pod Autoscaler (HPA) automatically adjusts the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed CPU/memory usage or custom metrics.
In this demo, we’ll:
- Deploy a simple NGINX-based application
- Expose it internally via a ClusterIP Service
- Create an HPA that scales Pods between 1 and 10 replicas
- Generate CPU load to observe the scaling in action
📁 Step 02: Review Kubernetes Manifests
Let’s create a new working directory and prepare the required YAML files.
mkdir kube-manifests-autoscalerV2
cd kube-manifests-autoscalerV2
🧩 01-kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp1-deployment
spec:
replicas: 1
selector:
matchLabels:
app: myapp1
template:
metadata:
name: myapp1-pod
labels:
app: myapp1
spec:
containers:
- name: myapp1-container
image: ghcr.io/stacksimplify/kubenginx:1.0.0
ports:
- containerPort: 80
resources:
requests:
memory: "5Mi"
cpu: "25m"
limits:
memory: "50Mi"
cpu: "50m"
💡 Note:
The CPU limits and requests are intentionally small to make autoscaling visible during the demo.
🧩 02-kubernetes-cip-service.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp1-cip-service
spec:
type: ClusterIP
selector:
app: myapp1
ports:
- name: http
port: 80
targetPort: 80
This creates an internal ClusterIP service so that other Pods (like our load generator) can access the app inside the cluster.
🧩 03-kubernetes-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cpu
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp1-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 30
💡 Explanation:
- Scales myapp1-deployment between 1–10 Pods
- Target average CPU utilization: 30%
- If average CPU usage across Pods exceeds 30%, Kubernetes will create more Pods automatically.
⚙️ Step 03: Deploy the Sample App and Verify
Apply all manifests:
kubectl apply -f kube-manifests-autoscalerV2/
✅ Check the current Pods
kubectl get pods
👉 Observation: Only 1 Pod should be running initially.
✅ Verify HPA
kubectl get hpa
You should see something like:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
cpu Deployment/myapp1-deployment 0%/30% 1 10 1 1m
🔥 Run a Load Test (in a new terminal)
We’ll generate continuous load using a BusyBox container:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- \
/bin/sh -c "while sleep 0.01; do wget -q -O- http://myapp1-cip-service; done"
This will repeatedly send requests to our service, increasing CPU usage.
✅ Observe Scale-Out Event
In another terminal:
kubectl get pods
You’ll gradually see new Pods being created as the load increases.
🧠 Check Pod Metrics
kubectl top pod
You can also watch how HPA reacts:
kubectl get hpa --watch
✅ Observe Scale-In Event
After stopping the load generator, HPA will scale down the Pods over a few minutes (usually 3–5 mins).
kubectl get pods
👉 Observation: Only 1 Pod should remain after scaling in.
🧹 Step 04: Clean-Up
Delete the load generator (if it’s stuck in error state):
kubectl delete pod load-generator
Delete the sample app and HPA:
kubectl delete -f 01-kubernetes-deployment.yaml
kubectl delete -f 02-kubernetes-cip-service.yaml
kubectl delete -f 03-kubernetes-hpa.yaml
⚡ Step 05: Create HPA Using Imperative Command
You can also create HPA without YAML using kubectl autoscale.
🧾 Deploy the App Again
kubectl apply -f 01-kubernetes-deployment.yaml
Check running Pods:
kubectl get pods
🧰 Create HPA Imperatively
kubectl autoscale deployment myapp1-deployment --min=3 --max=10 --cpu-percent=30
This command creates an HPA resource that ensures:
- Minimum 3 replicas
- Maximum 10 replicas
- Scales based on 30% CPU target
🔍 Verify
kubectl get hpa
kubectl get pods
👉 You’ll see 3 Pods running as per --min=3.
📜 Review the Generated HPA YAML
kubectl get hpa myapp1-deployment -o yaml
You’ll notice it uses apiVersion: autoscaling/v2.
🧹 Step 06: Final Clean-Up
kubectl delete -f kube-manifests-autoscalerV2/01-kubernetes-deployment.yaml
kubectl delete hpa myapp1-deployment
📊 Step 07: Use kubectl top to Check Resource Usage
# View node metrics
kubectl top node
# View pod metrics (e.g., in kube-system namespace)
kubectl top pod -n kube-system
🖥️ Step 08: Configure HPA via GKE Console (GUI Method)
You can also create and manage Horizontal Pod Autoscaling (HPA) directly from the Google Cloud Console — no YAML or kubectl needed.
🔹 Steps:
- Go to the Google Cloud Console
- Navigate to Kubernetes Engine → Clusters
- Select your Cluster
- Go to the Workloads tab
- Click on your Application (Pod/Deployment)
- Under the Details section, scroll down to Autoscaling
- Choose Horizontal Pod Autoscaler → Configure
⚙️ Configuration Options:
- You can specify:
- Minimum number of Pods (e.g., 1)
- Maximum number of Pods (e.g., 10)
- Target CPU utilization percentage (e.g., 30%)
- Optionally, use custom or memory-based metrics
🎯 Summary
Step | Description |
---|---|
1 | Create a Deployment and ClusterIP Service |
2 | Define HPA YAML or use kubectl autoscale
|
3 | Run a load generator to test scaling |
4 | Observe Pods scale out/in |
5 | Clean up resources after the test |
💡 Key Takeaways
✅ Autoscaling is automatic — no manual intervention needed
✅ HPA uses the Metrics Server to fetch CPU/memory usage
✅ Use low resource limits in demos for visible scaling
✅ Imperative and declarative approaches both work
✅ Helps achieve cost efficiency and application resilience
🌟 Thanks for reading! If this post added value, a like ❤️, follow, or share would encourage me to keep creating more content.
— Latchu | Senior DevOps & Cloud Engineer
☁️ AWS | GCP | ☸️ Kubernetes | 🔐 Security | ⚡ Automation
📌 Sharing hands-on guides, best practices & real-world cloud solutions
Top comments (0)