Kubernetes Event Driven Autoscaling (KEDA) enabling Kubernetes workloads (deployments, statefulsets, CRDs and Jobs) to scale horizontally in response to real world events like a RabbitMQ queue length.
How KEDA works
KEDA can be described as the extension of Kubernetes's native autoscaling feature HorizontalPodAutoscaler. So, instead of only use resource metrics (like cpu and memory) or custom metrics we can now extend that to scale horizontally on events produced by a source. which could be a useful solution for use cases where native HPA behavior is not actually working as expected to be.
In today's world, where cloud native environments are mostly microservices based. KEDA integration into Kubernetes is crucial to achieve responsiveness and efficient resource utilization by dynamically adjusting workloads in response to fluctuating traffic.
Mainly, KEDA can operate for two different use cases:
Scale deployments, statefulsets and CRDs using ScaledObjects: it's the core KEDA functionality for scaling. Using ScaledObjects, we instruct KEDA how we want to scale in response to a defined event or metric. It's suitable option for short running process, like web application, where we can scale quickly in response to high traffic by scaling in or out the number of pods.
Scale jobs using ScaledJobs: fit for batch or long running processes. Scale by running bursts of work using jobs that can run one or multiple times.
Internally, KEDA's architecture include the following components:
- Metrics Server (keda-operator-metrics-apiserver): Acts as a bridge between external event sources and Kubernetes' Horizontal Pod Autoscaler (HPA). It translates event metrics (for example, queue length, stream lag) into Kubernetes-compatible metrics. HPA consumes metrics from this server to scale pods beyond one replica (from one to n).
- KEDA Operator (keda-operator): Manages the lifecycle of KEDA's CRDs (for example, ScaledObject, ScaledJob) and acts as an agent for scaling deployments from zero to one when events are detected and to zero during idle periods. Uses Kubernetes controller loops to watch ScaledObject/ScaledJob resources and adjust replica counts via the Kubernetes API.
- Admission Webhooks (keda-admission-webhooks): Validates and mutates KEDA resources during creation/updates to prevent misconfigurations. Intercepts Kubernetes API requests for KEDA CRDs before persistence.
- Scalers: they are NOT CRDs, they are code-level components (Go implementations) within KEDA's control plane (KEDA Operator and Metrics Server). Act as adapters or connectors that translate event-source logic (for example, "get Redis queue length") into metrics Kubernetes understands.
Authentication with Trigger Authentication
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: rabbitmq-trigger-authentication
namespace: spring-boot
spec:
secretTargetRef:
- parameter: host
key: rabbitmq-uri
name: tts-secrets
TriggerAuthentication is a KEDA CRD that securely provides credentials to scalers without hardcoding sensitive data (for example, passwords, tokens) in our scaling definitions (ScaledObjec and ScaledJob YAMLs). It acts as a centralized credential bridge between: Event sources (for example, RabbitMQ, Kafka, AWS SQS) and Scalers (for example, rabbitmq scaler in a ScaledObject).
Key components of TriggerAuthentication YAML config:
+------------------+-------------------+----------------------------------------------------------------------------+
| Field | Value | Purpose |
+------------------+-------------------+----------------------------------------------------------------------------+
| secretTargetRef | Credential source | References a Kubernetes Secret (avoids hardcoding secrets in YAML). |
| parameter: host | Scaler parameter | Injects the Secret's value into the scaler's host field. |
| key: rabbitmq-uri| Secret data key | Identifies which entry in the Secret to use (for example, rabbitmq-uri: |
| | | amqp://guest:password@localhost:5672/vhost). |
| name: tts-secrets| Secret name | The Kubernetes Secret where credentials are stored. |
+------------------+-------------------+----------------------------------------------------------------------------+
TriggerAuthentication should be injected into either a ScaledObject or ScaledJob, and when is used by ScaledObject in our example:
- The Scaler (RabbitMQ scaler) needs a connection URI (host) to monitor queues.
- Instead of embedding credentials, The ScaledObject references TriggerAuthentication to fetch host dynamically.
- Behind the scenes, KEDA reads the tts-secrets Secret then extracts rabbitmq-uri and injects it as host into the scaler.
Worth to mention that TriggerAuthentication are namespace scoped, which means the secrete should be in the same namespace as TriggerAuthentication CRD. For cluster scoped TriggerAuthentication, KEDA provide as with ClusterTriggerAuthentication, where secrets should be in same namespace as where KEDA components are deployed (most of cases is keda namespace). Also, an important feature of TriggerAuthentication and ClusterTriggerAuthentication, is that we can reference scaler parameters (like host parameter for rabbitmq scaler) with environment variables, secrets (as the case for our example), Hashicorp Vault secrets, Azure Key Vault secrets, GCP Secret Manager secrets ...
In summary, TriggerAuthentication decouples secrets from scaling logic, acting as a secure credential broker for KEDA scalers. By referencing Kubernetes Secrets, it ensures sensitive data never appears in ScaledObject and ScaledJob definitions, making our event-driven infrastructure both scalable and secure.
Expose a deployment with Gateway API
We are exposing the frontend application (tts) using Gateway API, which is as set of API resources help us to manage traffic to different backends and is considered as the successor of Ingress.
First, we need to install one of the implementations of Gateway API, please refer to the installation guide from Nginx to install using Helm.. Also, they have a nice simple example to get familiar with it.
Then we define our gateway instance that is linked to our traffic handling infrastructure, which is the nginx gateway fabric. We need to define the network endpoint that listening on port 80 for HTTP protocol:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: tts-gateway
namespace: spring-boot
spec:
gatewayClassName: nginx
listeners:
- name: http
port: 80
protocol: HTTP
Next, we define the HTTP specific rules to handle the actual traffic from the gateway listener to our backend service:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: tts-route
namespace: spring-boot
spec:
parentRefs:
- name: tts-gateway
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: tts-svc
port: 8080
Autoscale deployment with KEDA
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: tts-analytics-rabbitmq-scaled-object
namespace: spring-boot
spec:
scaleTargetRef:
name: tts-analytics
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: rabbitmq
metadata:
protocol: amqp
queueName: post-tts-analytics.analytics-group
mode: QueueLength
value: "5"
authenticationRef:
name: rabbitmq-trigger-authentication
ScaledObject CRD is KEDA's primary scaling configuration that Links an event source (RabbitMQ) to a Kubernetes workload (deployment, statefulset, CRD) and defines scaling rules and behavior. Mainly it abstracts away HPA complexity while generating HPA resources under the hood.
Key components of ScaledObject YAML config:
+---------------+----------------------+---------------------------------------------+
| Section | Key Field | Function |
+---------------+----------------------+---------------------------------------------+
| Scale Target | scaleTargetRef.name | Identifies the Deployment (tts-analytics) |
| | | to scale |
| Max Replicas | maxReplicaCount | Sets max replicas count to prevent |
| | | over scaling |
| Min Replicas | minReplicaCount | Sets min replicas count |
| Event Source | type: rabbitmq | Uses KEDA's RabbitMQ scaler for queue |
| | | monitoring |
| Queue Config | queueName | Specific queue to monitor |
| | | (post-tts-analytics.analytics-group) |
| Scaling Logic | mode: QueueLength | Scales based on total messages in queue |
| | | (alternative: MessageRate) |
| Target Load | value: "5" | Aims for 5 messages per replica |
| | | (HPA will maintain this ratio) |
| Security | authenticationRef | Links to TriggerAuthentication |
| | | for secure credentials |
+---------------+----------------------+---------------------------------------------+
How Autoscaling Works? With this configuration, KEDA will:
- Connect to RabbitMQ, it will use credentials from rabbitmq-trigger-authentication. Monitors post-tts-analytics.analytics-group queue via AMQP protocol. Every 30s (default polling interval) it will check queue length from RabbitMQ.
- Calculate desired replicas with the following formula: desiredReplicas = ceil(currentQueueLength / value). While value, is the metric value which is queue length.
- If current replicas ≠ desired replicas, KEDA updates the tts-analytics Deployment's replica count (under the hood, this is done with HPA).
- When queue is empty, KEDA scale-in to min replicas (one). Scale-in (downscale) behavior, is controlled by HPA's stabilizationWindowSeconds (default 5 minutes) since we have min replicas > 0.
By specifying minReplicaCount to 1 and setting maxReplicaCount to 10, KEDA will set value of min replica for HPA to 1 when traffic went down. While 10 max replicas work as a safety mechanism to prevent over scaling out in case of queue flooded with messages.
Deploy RabbitMQ
In most cases RabbitMQ cluster will be a cloud managed service. But for test purposes we can deploy it inside our cluster using bitnami/rabbitmq helm chart with the following custom values:
auth:
username: guest
password: guest
persistence:
enabled: false
env:
- name: RABBITMQ_FORCE_BOOT
value: "yes"
replicaCount: 2
clustering:
enabled: true
resources:
limits:
memory: 1Gi
cpu: 1
requests:
memory: 512Mi
cpu: 500m
After deploying it, it should be like:
student@control-plane:~/$ kubectl -n rabbitmq get all
NAME READY STATUS RESTARTS AGE
pod/rabbitmq-0 1/1 Running 0 9m5s
pod/rabbitmq-1 1/1 Running 0 10m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/rabbitmq ClusterIP 10.110.197.82 <none> 5672/TCP,4369/TCP,25672/TCP,15672/TCP 24h
service/rabbitmq-headless ClusterIP None <none> 4369/TCP,5672/TCP,25672/TCP,15672/TCP 24h
NAME READY AGE
statefulset.apps/rabbitmq 2/2 24h
Demo
We deploy ScaledObject using the yaml above, we should see something like:
student@control-plane:~/$ kubectl -n spring-boot get scaledobjects.keda.sh tts-analytics-rabbitmq-scaled-object
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX READY ACTIVE FALLBACK PAUSED TRIGGERS AUTHENTICATIONS AGE
tts-analytics-rabbitmq-scaled-object apps/v1.Deployment tts-analytics 1 10 True False False Unknown rabbitmq rabbitmq-trigger-authentication 23m
By looking into READY column, we can be sure that our ScaledObject is able to connect to our RabbitMQ cluster through the configured TriggerAuthentication and is listening to metrics (queue length) each pollingInterval (30 seconds) or when HPA controller asks for metrics (15 seconds).
For testing purposes, we going to use Siege load testing tool. It will call the endpoint exposed by TTS deployment, for this lab we used Gateway API to manage traffic from outside cluster to tts-service that exposes tts deployment. For local deployment, you can use cluster IP local address instead of a Load balancer public IP.
~$ siege -c 4 -t 60S --header="Content-Type: application/json" "http://$CLUSTER_LOAD_BALANCER_PUBLIC_IP/tts POST < ./payload.json"
~$ cat payload.json
{"text":"Hi this is a test!!"}
We can see below how the corresponding HPA updates replicas count for the target deployment based on metric values shipped by KEDA:
student@control-plane:~/$ kubectl -n spring-boot get hpa keda-hpa-tts-analytics-rabbitmq-scaled-object -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics <unknown>/5 (avg) 1 10 1 32s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 0/5 (avg) 1 10 1 76s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 180/5 (avg) 1 10 1 3m16s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 103500m/5 (avg) 1 10 4 3m32s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 74500m/5 (avg) 1 10 8 3m47s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 78700m/5 (avg) 1 10 10 4m2s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 79800m/5 (avg) 1 10 10 4m17s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 77300m/5 (avg) 1 10 10 4m32s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 74900m/5 (avg) 1 10 10 4m47s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 72400m/5 (avg) 1 10 10 5m2s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 69900m/5 (avg) 1 10 10 5m17s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 67400m/5 (avg) 1 10 10 5m32s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 64900m/5 (avg) 1 10 10 5m47s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 62400m/5 (avg) 1 10 10 6m2s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 60/5 (avg) 1 10 10 6m17s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 57500m/5 (avg) 1 10 10 6m32s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 55/5 (avg) 1 10 10 6m47s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 52400m/5 (avg) 1 10 10 7m3s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 49900m/5 (avg) 1 10 10 7m18s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 47400m/5 (avg) 1 10 10 7m33s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 44900m/5 (avg) 1 10 10 7m48s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 42500m/5 (avg) 1 10 10 8m3s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 39900m/5 (avg) 1 10 10 8m19s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 37400m/5 (avg) 1 10 10 8m34s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 34900m/5 (avg) 1 10 10 8m49s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 32400m/5 (avg) 1 10 10 9m4s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 29900m/5 (avg) 1 10 10 9m19s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 27400m/5 (avg) 1 10 10 9m34s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 25/5 (avg) 1 10 10 9m49s
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 22500m/5 (avg) 1 10 10 10m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 20/5 (avg) 1 10 10 10m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 17500m/5 (avg) 1 10 10 10m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 14500m/5 (avg) 1 10 10 10m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 12400m/5 (avg) 1 10 10 11m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 9900m/5 (avg) 1 10 10 11m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 7300m/5 (avg) 1 10 10 11m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 4900m/5 (avg) 1 10 10 11m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 2400m/5 (avg) 1 10 10 12m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 0/5 (avg) 1 10 10 12m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics <unknown>/5 (avg) 1 10 10 13m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics <unknown>/5 (avg) 1 10 10 13m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 0/5 (avg) 1 10 10 13m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 0/5 (avg) 1 10 10 16m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 0/5 (avg) 1 10 10 16m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 0/5 (avg) 1 10 5 17m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics 0/5 (avg) 1 10 1 17m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics <unknown>/5 (avg) 1 10 1 21m
keda-hpa-tts-analytics-rabbitmq-scaled-object Deployment/tts-analytics <unknown>/5 (avg) 1 10 1 21m
We can see that HPA adjusting the number of replicas in response to load increase till it reaches maxReplicaCount (10). When traffic went down of target value 5, we can see that HPA is not scaling down immediately. This feature prevents hard scaling down in case of fluctuating traffic, HPA will wait for 5 minutes and then scale down to minReplicaCount since the queue is empty.
Summary
Autoscaling in Kubernetes usually works based on CPU or memory usage, but sometimes that's not enough. What if your app is working with queues, messages, or events? That's where KEDA comes in.
KEDA is a tool that helps your apps scale up or down based on events, like the number of messages in a queue. It's especially useful for apps that don't always need to be running full-time, like background jobs or workers.
It works with many event sources like Kafka, RabbitMQ, AWS SQS, and others. You just define a ScaledObject in your YAML file, tell it what to watch (like a queue), and KEDA handles the rest.
In short, KEDA makes your Kubernetes apps smarter by scaling based on real-world events, not just system resources.
Top comments (0)