In this guide, we’ll walk through how to implement and test Horizontal Pod Autoscaling (HPA) on a GKE cluster.
You’ll learn how to automatically scale your Pods based on CPU utilization — helping your applications handle increased load efficiently and reduce costs during idle time.
📘 Step 01: Introduction
In Kubernetes, the Horizontal Pod Autoscaler (HPA) automatically adjusts the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed CPU/memory usage or custom metrics.
In this demo, we’ll:
- Deploy a simple NGINX-based application
- Expose it internally via a ClusterIP Service
- Create an HPA that scales Pods between 1 and 10 replicas
- Generate CPU load to observe the scaling in action
📁 Step 02: Review Kubernetes Manifests
Let’s create a new working directory and prepare the required YAML files.
mkdir kube-manifests-autoscalerV2
cd kube-manifests-autoscalerV2
🧩 01-kubernetes-deployment.yaml
apiVersion: apps/v1
kind: Deployment 
metadata: 
  name: myapp1-deployment
spec: 
  replicas: 1
  selector:
    matchLabels:
      app: myapp1
  template:  
    metadata:
      name: myapp1-pod
      labels:
        app: myapp1  
    spec:
      containers: 
        - name: myapp1-container
          image: ghcr.io/stacksimplify/kubenginx:1.0.0
          ports: 
            - containerPort: 80  
          resources:
            requests:
              memory: "5Mi"
              cpu: "25m"
            limits:
              memory: "50Mi"
              cpu: "50m"
💡 Note:
The CPU limits and requests are intentionally small to make autoscaling visible during the demo.
🧩 02-kubernetes-cip-service.yaml
apiVersion: v1
kind: Service 
metadata:
  name: myapp1-cip-service
spec:
  type: ClusterIP
  selector:
    app: myapp1
  ports: 
    - name: http
      port: 80
      targetPort: 80
This creates an internal ClusterIP service so that other Pods (like our load generator) can access the app inside the cluster.
🧩 03-kubernetes-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cpu
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp1-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30
💡 Explanation:
- Scales myapp1-deployment between 1–10 Pods
- Target average CPU utilization: 30%
- If average CPU usage across Pods exceeds 30%, Kubernetes will create more Pods automatically.
⚙️ Step 03: Deploy the Sample App and Verify
Apply all manifests:
kubectl apply -f kube-manifests-autoscalerV2/
✅ Check the current Pods
kubectl get pods
👉 Observation: Only 1 Pod should be running initially.
✅ Verify HPA
kubectl get hpa
You should see something like:
NAME   REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
cpu    Deployment/myapp1-deployment   0%/30%     1         10         1         1m
🔥 Run a Load Test (in a new terminal)
We’ll generate continuous load using a BusyBox container:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- \
/bin/sh -c "while sleep 0.01; do wget -q -O- http://myapp1-cip-service; done"
This will repeatedly send requests to our service, increasing CPU usage.
✅ Observe Scale-Out Event
In another terminal:
kubectl get pods
You’ll gradually see new Pods being created as the load increases.
🧠 Check Pod Metrics
kubectl top pod
You can also watch how HPA reacts:
kubectl get hpa --watch
✅ Observe Scale-In Event
After stopping the load generator, HPA will scale down the Pods over a few minutes (usually 3–5 mins).
kubectl get pods
👉 Observation: Only 1 Pod should remain after scaling in.
🧹 Step 04: Clean-Up
Delete the load generator (if it’s stuck in error state):
kubectl delete pod load-generator
Delete the sample app and HPA:
kubectl delete -f 01-kubernetes-deployment.yaml
kubectl delete -f 02-kubernetes-cip-service.yaml
kubectl delete -f 03-kubernetes-hpa.yaml
⚡ Step 05: Create HPA Using Imperative Command
You can also create HPA without YAML using kubectl autoscale.
🧾 Deploy the App Again
kubectl apply -f 01-kubernetes-deployment.yaml
Check running Pods:
kubectl get pods
🧰 Create HPA Imperatively
kubectl autoscale deployment myapp1-deployment --min=3 --max=10 --cpu-percent=30
This command creates an HPA resource that ensures:
- Minimum 3 replicas
- Maximum 10 replicas
- Scales based on 30% CPU target
🔍 Verify
kubectl get hpa
kubectl get pods
👉 You’ll see 3 Pods running as per --min=3.
📜 Review the Generated HPA YAML
kubectl get hpa myapp1-deployment -o yaml
You’ll notice it uses apiVersion: autoscaling/v2.
🧹 Step 06: Final Clean-Up
kubectl delete -f kube-manifests-autoscalerV2/01-kubernetes-deployment.yaml
kubectl delete hpa myapp1-deployment
📊 Step 07: Use kubectl top to Check Resource Usage
# View node metrics
kubectl top node
# View pod metrics (e.g., in kube-system namespace)
kubectl top pod -n kube-system
🖥️ Step 08: Configure HPA via GKE Console (GUI Method)
You can also create and manage Horizontal Pod Autoscaling (HPA) directly from the Google Cloud Console — no YAML or kubectl needed.
🔹 Steps:
- Go to the Google Cloud Console
- Navigate to Kubernetes Engine → Clusters
- Select your Cluster
- Go to the Workloads tab
- Click on your Application (Pod/Deployment)
- Under the Details section, scroll down to Autoscaling
- Choose Horizontal Pod Autoscaler → Configure
⚙️ Configuration Options:
- You can specify:
- Minimum number of Pods (e.g., 1)
- Maximum number of Pods (e.g., 10)
- Target CPU utilization percentage (e.g., 30%)
- Optionally, use custom or memory-based metrics
🎯 Summary
| Step | Description | 
|---|---|
| 1 | Create a Deployment and ClusterIP Service | 
| 2 | Define HPA YAML or use kubectl autoscale | 
| 3 | Run a load generator to test scaling | 
| 4 | Observe Pods scale out/in | 
| 5 | Clean up resources after the test | 
💡 Key Takeaways
✅ Autoscaling is automatic — no manual intervention needed
✅ HPA uses the Metrics Server to fetch CPU/memory usage
✅ Use low resource limits in demos for visible scaling
✅ Imperative and declarative approaches both work
✅ Helps achieve cost efficiency and application resilience
🌟 Thanks for reading! If this post added value, a like ❤️, follow, or share would encourage me to keep creating more content.
— Latchu | Senior DevOps & Cloud Engineer
☁️ AWS | GCP | ☸️ Kubernetes | 🔐 Security | ⚡ Automation
📌 Sharing hands-on guides, best practices & real-world cloud solutions
 
 
              







 
    
Top comments (0)