DEV Community

Cover image for Part-116: 🚀Implement Horizontal Pod Autoscaling (HPA) in Google Kubernetes Engine (GKE)
Latchu@DevOps
Latchu@DevOps

Posted on

Part-116: 🚀Implement Horizontal Pod Autoscaling (HPA) in Google Kubernetes Engine (GKE)

In this guide, we’ll walk through how to implement and test Horizontal Pod Autoscaling (HPA) on a GKE cluster.

You’ll learn how to automatically scale your Pods based on CPU utilization — helping your applications handle increased load efficiently and reduce costs during idle time.


📘 Step 01: Introduction

In Kubernetes, the Horizontal Pod Autoscaler (HPA) automatically adjusts the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed CPU/memory usage or custom metrics.

In this demo, we’ll:

  • Deploy a simple NGINX-based application
  • Expose it internally via a ClusterIP Service
  • Create an HPA that scales Pods between 1 and 10 replicas
  • Generate CPU load to observe the scaling in action

📁 Step 02: Review Kubernetes Manifests

Let’s create a new working directory and prepare the required YAML files.

mkdir kube-manifests-autoscalerV2
cd kube-manifests-autoscalerV2
Enter fullscreen mode Exit fullscreen mode

🧩 01-kubernetes-deployment.yaml

apiVersion: apps/v1
kind: Deployment 
metadata: 
  name: myapp1-deployment
spec: 
  replicas: 1
  selector:
    matchLabels:
      app: myapp1
  template:  
    metadata:
      name: myapp1-pod
      labels:
        app: myapp1  
    spec:
      containers: 
        - name: myapp1-container
          image: ghcr.io/stacksimplify/kubenginx:1.0.0
          ports: 
            - containerPort: 80  
          resources:
            requests:
              memory: "5Mi"
              cpu: "25m"
            limits:
              memory: "50Mi"
              cpu: "50m"
Enter fullscreen mode Exit fullscreen mode

💡 Note:
The CPU limits and requests are intentionally small to make autoscaling visible during the demo.


🧩 02-kubernetes-cip-service.yaml

apiVersion: v1
kind: Service 
metadata:
  name: myapp1-cip-service
spec:
  type: ClusterIP
  selector:
    app: myapp1
  ports: 
    - name: http
      port: 80
      targetPort: 80
Enter fullscreen mode Exit fullscreen mode

This creates an internal ClusterIP service so that other Pods (like our load generator) can access the app inside the cluster.


🧩 03-kubernetes-hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cpu
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp1-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 30
Enter fullscreen mode Exit fullscreen mode

💡 Explanation:

  • Scales myapp1-deployment between 1–10 Pods
  • Target average CPU utilization: 30%
  • If average CPU usage across Pods exceeds 30%, Kubernetes will create more Pods automatically.

⚙️ Step 03: Deploy the Sample App and Verify

Apply all manifests:

kubectl apply -f kube-manifests-autoscalerV2/
Enter fullscreen mode Exit fullscreen mode

✅ Check the current Pods

kubectl get pods
Enter fullscreen mode Exit fullscreen mode

👉 Observation: Only 1 Pod should be running initially.

✅ Verify HPA

kubectl get hpa
Enter fullscreen mode Exit fullscreen mode

You should see something like:

NAME   REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
cpu    Deployment/myapp1-deployment   0%/30%     1         10         1         1m
Enter fullscreen mode Exit fullscreen mode

h1


🔥 Run a Load Test (in a new terminal)

We’ll generate continuous load using a BusyBox container:

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- \
/bin/sh -c "while sleep 0.01; do wget -q -O- http://myapp1-cip-service; done"
Enter fullscreen mode Exit fullscreen mode

This will repeatedly send requests to our service, increasing CPU usage.


✅ Observe Scale-Out Event

In another terminal:

kubectl get pods
Enter fullscreen mode Exit fullscreen mode

h2

You’ll gradually see new Pods being created as the load increases.


🧠 Check Pod Metrics

kubectl top pod
Enter fullscreen mode Exit fullscreen mode

You can also watch how HPA reacts:

kubectl get hpa --watch
Enter fullscreen mode Exit fullscreen mode

h3


✅ Observe Scale-In Event

After stopping the load generator, HPA will scale down the Pods over a few minutes (usually 3–5 mins).

kubectl get pods
Enter fullscreen mode Exit fullscreen mode

👉 Observation: Only 1 Pod should remain after scaling in.


🧹 Step 04: Clean-Up

Delete the load generator (if it’s stuck in error state):

kubectl delete pod load-generator
Enter fullscreen mode Exit fullscreen mode

Delete the sample app and HPA:

kubectl delete -f 01-kubernetes-deployment.yaml
kubectl delete -f 02-kubernetes-cip-service.yaml
kubectl delete -f 03-kubernetes-hpa.yaml
Enter fullscreen mode Exit fullscreen mode

⚡ Step 05: Create HPA Using Imperative Command

You can also create HPA without YAML using kubectl autoscale.

🧾 Deploy the App Again

kubectl apply -f 01-kubernetes-deployment.yaml
Enter fullscreen mode Exit fullscreen mode

Check running Pods:

kubectl get pods
Enter fullscreen mode Exit fullscreen mode

🧰 Create HPA Imperatively

kubectl autoscale deployment myapp1-deployment --min=3 --max=10 --cpu-percent=30
Enter fullscreen mode Exit fullscreen mode

h4

This command creates an HPA resource that ensures:

  • Minimum 3 replicas
  • Maximum 10 replicas
  • Scales based on 30% CPU target

🔍 Verify

kubectl get hpa
kubectl get pods
Enter fullscreen mode Exit fullscreen mode

h5

👉 You’ll see 3 Pods running as per --min=3.


📜 Review the Generated HPA YAML

kubectl get hpa myapp1-deployment -o yaml
Enter fullscreen mode Exit fullscreen mode

You’ll notice it uses apiVersion: autoscaling/v2.


🧹 Step 06: Final Clean-Up

kubectl delete -f kube-manifests-autoscalerV2/01-kubernetes-deployment.yaml
kubectl delete hpa myapp1-deployment
Enter fullscreen mode Exit fullscreen mode

📊 Step 07: Use kubectl top to Check Resource Usage

# View node metrics
kubectl top node

# View pod metrics (e.g., in kube-system namespace)
kubectl top pod -n kube-system
Enter fullscreen mode Exit fullscreen mode

h6


🖥️ Step 08: Configure HPA via GKE Console (GUI Method)

You can also create and manage Horizontal Pod Autoscaling (HPA) directly from the Google Cloud Console — no YAML or kubectl needed.

🔹 Steps:

  1. Go to the Google Cloud Console
  2. Navigate to Kubernetes Engine → Clusters
  3. Select your Cluster
  4. Go to the Workloads tab
  5. Click on your Application (Pod/Deployment)
  6. Under the Details section, scroll down to Autoscaling
  7. Choose Horizontal Pod Autoscaler → Configure

h7

⚙️ Configuration Options:

  • You can specify:
  • Minimum number of Pods (e.g., 1)
  • Maximum number of Pods (e.g., 10)
  • Target CPU utilization percentage (e.g., 30%)
  • Optionally, use custom or memory-based metrics

🎯 Summary

Step Description
1 Create a Deployment and ClusterIP Service
2 Define HPA YAML or use kubectl autoscale
3 Run a load generator to test scaling
4 Observe Pods scale out/in
5 Clean up resources after the test

💡 Key Takeaways

✅ Autoscaling is automatic — no manual intervention needed
✅ HPA uses the Metrics Server to fetch CPU/memory usage
✅ Use low resource limits in demos for visible scaling
✅ Imperative and declarative approaches both work
✅ Helps achieve cost efficiency and application resilience


🌟 Thanks for reading! If this post added value, a like ❤️, follow, or share would encourage me to keep creating more content.


— Latchu | Senior DevOps & Cloud Engineer

☁️ AWS | GCP | ☸️ Kubernetes | 🔐 Security | ⚡ Automation
📌 Sharing hands-on guides, best practices & real-world cloud solutions

Top comments (0)