Scaling Kubernetes Load with KEDA

#automation #kubernetes #performance

In multiple projects, I have found myself asking the same question: Why is the HPA not scaling my pods when it is clearly needed? The answer is actually very simple—HPA only scales based on CPU or memory utilization, and that does not always reflect real-world workloads. What happens when we need to scale an application using different types of metrics, such as requests per second, queue depth, or even the result of a database query?

That's where KEDA comes into play.

On June 14, 2025, I participated in KCD Guatemala 2025, where I spoke about scaling applications on Kubernetes using KEDA. My experience working on multiple data-intensive and highly irregular workloads has shown me that this is an especially relevant topic for anyone—whether beginner or experienced—who works with any flavor of Kubernetes.

KEDA

KEDA is an excellent tool that complements the Kubernetes ecosystem. It monitors external events—such as endpoints, message queues, Prometheus or Grafana metrics, SQL queries, and HTTP traffic—and transforms these signals into metrics that Kubernetes can understand and use for scaling decisions. The Horizontal Pod Autoscaler remains responsible for scaling the application, but it now receives metrics that more accurately represent the real application load.

This enables a very powerful capability: scaling workloads down to zero when there is no traffic and automatically scaling pods back up as events arrive. It is a simple pattern, yet one that can dramatically improve operational efficiency.

KEDA Arquitecture

KEDA architecture is very interesting because it does not replace HPA, instead it allows a very ellegant scaling flow.

Keda integrates natively into a Kubernetes cluster to enable event-driven autoscaling without changing how Kubernetes fundamentally works. The process starts with a ScaledObject, a custom resource that defines which workload should scale and which external event source should drive that scaling. This definition is registered through the Kubernetes API Server, just like any other native resource. Inside the cluster, KEDA countinuously observes the external trigger process, such as message queue, HTTP Traffic or a monitoring system, usually using specialized scalers. When events are detected, KEDA´s controller evalutates the workload demand and exposes the correspoding metric throud its metrics adapter. These metris are then consumed bu the HPA, which reamins the only component responsible for scaling pods. When events are not present, KEDA allows the workload to scale down to zero, and when events are no present. KEDA allows the workload to scale down to zero, and when events reappear to scale back up. The application itself remains unchangend, making KEDA a clean and powerful extension that connects kubernetes autoscaling with real-world event sources.

Proof of Concept

In the proof of concept that was presented on KCD Guatemala 2025, I showcased how kubernetes workloads can be automatically scaled based on real HTTP traffic using KEDA. The PoC included a lightweight sample application, KEDA HTTP add-on configuration, ScaledObject definitions, and Kubernetes manifests that together demonstrate event-driven autoscaling from zero to multiple replicas and back down when traffic stops. It alos includes scripts and instructions to generate load and observe sclaing behaviour in real time, making it easy to understand how KEDA integrages with the Horizontal Pod Autoscaler under the hood. The objective is not to build a production-ready system, but to prived a clear, hands-on example of how event-based autoscaling works in procative within a Kubernetes cluster.

The complete setup, source code, and documentation is available on Gitub at keda-demo-kcd

DEV Community

Scaling Kubernetes Load with KEDA

KEDA

KEDA Arquitecture

Proof of Concept

Top comments (0)