Saul Fernandez

Posted on Oct 1, 2023

HPA vs. KEDA in Kubernetes - The Autoscaling Guide to Know When and Where to Use Them

#kubernetes #keda #autoscaling #guide

Introduction

Hey there! Remember the first time you got your hands on Kubernetes? Ah, the good ol' days. I was so green back then that I thought Horizontal Pod Autoscaling (HPA) was the be-all and end-all for scaling in Kubernetes. I mean, it was like discovering fire; it felt like I had this incredible tool that could solve all my scalability problems.

Fast forward a bit, and I landed roles where KEDA was the star of the show, especially in machine learning event-driven applications. We were using RabbitMQ queue metrics to scale our ML consumers like a charm. It was like going from a bicycle to a sports car in the world of autoscaling.

Now, in my current gig, we started off with HPA, just like old times. But as we scaled and our needs evolved, we found ourselves hitting the same limitations I'd discovered years ago. That's when we decided to bring KEDA into the mix, and let me tell you, it's been a game-changer.

So why am I telling you all this? Because I want to share these hard-earned lessons with you. In this article, we're going to dissect HPA and KEDA, compare their strengths and weaknesses, and dive into real-world scenarios. My goal is to arm you with the knowledge to make informed decisions right from the get-go, so you know exactly when to use HPA and when to switch gears to KEDA.

What is HPA?

HPA automatically adjusts the number of pod replicas in a deployment or replica set based on observed metrics like CPU or memory usage. You set a target—like 70% CPU utilization—and HPA does the rest, scaling the pods in or out to maintain that level. It's like putting your scaling operations on cruise control.

Why Was HPA Devised?

Back in the day, before the cloud-native era, scaling was often a manual and painful process. You'd have to provision new servers, configure them, and then deploy your application. This was time-consuming, error-prone, and not very agile.

When Kubernetes came along, it revolutionized how we think about deploying and managing applications. But Kubernetes needed a way to handle automatic scaling to truly make the platform dynamic and responsive to the actual needs of running applications. That's where HPA comes in.

Simplicity: HPA is designed to be simple and straightforward. You don't need a Ph.D. in distributed systems to set it up. Just specify the metric and the target, and you're good to go.
Resource Efficiency: Before autoscaling, you'd often over-provision resources to handle potential spikes in traffic, which is wasteful. HPA allows you to use resources more efficiently by scaling based on actual needs.
Operational Ease: With HPA, the operational burden is reduced. You don't have to wake up in the middle of the night to scale your application manually; HPA has got your back.
Built-In Metrics: Initially, HPA was designed to work with basic metrics like CPU and memory, which are often good enough indicators for many types of workloads.

So, in a nutshell, HPA was devised to make life easier for DevOps folks like us, allowing for more efficient use of resources and simplifying operational complexities. It's like the Swiss Army knife of Kubernetes scaling for straightforward use-cases. What do you think? Want to dive deeper into any aspect of HPA?

So... When to Use HPA?

Predictable Workloads: If you're dealing with an application that has a fairly predictable pattern—like a web app that gets more traffic during the day and less at night—HPA is a solid choice. You can set it to scale based on CPU or memory usage, which are often good indicators of load for these types of apps.
Simple Metrics: HPA is great when you're looking at straightforward metrics like CPU and memory. If you don't need to scale based on more complex or custom metrics, HPA is easier to set up and manage.
Quick Setup: If you're in a situation where you need to get autoscaling up and running quickly, HPA is your friend. Being a native Kubernetes feature, it's well-documented and supported, making it easier to implement.
Stateless Applications: HPA is particularly well-suited for stateless applications where each pod is interchangeable. This makes it easier to scale pods in and out without worrying about maintaining state.
Built-In Kubernetes Support: Since HPA is a built-in feature, it comes with the advantage of native integration into the Kubernetes ecosystem, including monitoring and logging through tools like Prometheus and Grafana.

What is KEDA?

KEDA stands for Kubernetes Event-Driven Autoscaling. Unlike HPA, which is more about scaling based on system metrics like CPU and memory, KEDA is designed to scale your application based on events. These events could be anything from the length of a message queue to the number of unprocessed database records.

KEDA works by deploying a custom metric server and custom resources in your Kubernetes cluster. It then integrates with various event sources like Kafka, RabbitMQ, Azure Event Hubs, and many more, allowing you to scale your application based on metrics from these systems.

Why Was KEDA Devised?

Event-Driven Architectures: Modern applications are increasingly adopting event-driven architectures, where services communicate asynchronously through events. Traditional autoscalers like HPA aren't designed to handle this kind of workload.
Complex Metrics: While HPA is great for simple metrics, what if you need to scale based on the length of a Kafka topic or the number of messages in an Azure Queue? That's where KEDA comes in.
Zero to N Scaling: One of the coolest features of KEDA is its ability to scale your application back to zero when there are no events to process. This can lead to significant cost savings.
Extensibility: KEDA is designed to be extensible, allowing you to write your own scalers or use community-contributed ones. This makes it incredibly flexible and adaptable to various use-cases.
Multi-Cloud and On-Premises: KEDA supports a wide range of event sources, making it suitable for both cloud and on-premises deployments.

The Gap that KEDA Fills Over HPA

While HPA is like your reliable sedan, KEDA is more like a tricked-out sports car with all the bells and whistles. It was devised to fill the gaps left by HPA, particularly for applications that are event-driven or that require scaling based on custom or external metrics.

So, if you're dealing with complex, event-driven architectures, or if you need to scale based on metrics that HPA doesn't support out of the box, KEDA is your go-to. It's like the next evolution in Kubernetes autoscaling, designed for the complexities of modern, cloud-native applications.

Real-World Scenarios

Real Cases for Using HPA Over KEDA

1. Basic Web Application

Scenario: You're running a simple web application that serves static content and has predictable spikes in traffic, like during a marketing campaign.

In this case, the scaling needs are straightforward and based on CPU or memory usage. HPA is easier to set up and manage for this kind of scenario. You don't need the event-driven capabilities that KEDA offers.

2. Internal Business Application

Scenario: You have an internal application used by employees for tasks like data entry, which sees higher usage during business hours and lower usage otherwise.

Again, the load pattern is predictable and can be managed easily with simple metrics like CPU and memory. HPA's native integration with Kubernetes makes it a straightforward choice, without the need for the more complex setup that KEDA might require.

3. Stateless Microservices

Scenario: You're running a set of stateless microservices that handle tasks like authentication, logging, or caching. These services have a consistent load and don't rely on external events.

These types of services often scale well based on system metrics, making HPA a good fit. Since they're stateless, scaling in and out is less complex, and HPA can handle it easily.

4. Traditional RESTful API

Scenario: You have a RESTful API that serves mobile or web clients. The API has a steady rate of requests but might experience occasional spikes.

In this case, you can set up HPA to scale based on request rates or CPU usage, which are good indicators of load for this type of application. KEDA's event-driven scaling would be overkill for this scenario.

Why Choose HPA in These Cases?

Simplicity: HPA is easier to set up and manage for straightforward scaling needs. If you don't need to scale based on complex or custom metrics, HPA is the way to go.
Native Support: Being a built-in Kubernetes feature, HPA has native support and a broad community, making it easier to find help or resources.
Resource Efficiency: For applications with predictable workloads, HPA allows you to efficiently use your cluster resources without the need for more complex scaling logic.
Operational Ease: HPA requires less ongoing maintenance and has fewer components to manage compared to KEDA, making it a good choice for smaller teams or simpler applications.

Real Cases for Using KEDA Over HPA

1. Event-Driven ML Inference

Scenario: You have a machine learning application for real-time fraud detection. Transactions are events funneled into an AWS SQS queue.

Why KEDA Over HPA: With KEDA, you can dynamically adjust the number of inference pods based on the SQS queue length, ensuring timely fraud detection. HPA's system metrics like CPU or memory wouldn't be as effective for this use-case.

2. IoT Data Processing

Scenario: Your IoT application collects sensor data that's sent to an Azure Event Hub for immediate processing.

Why KEDA Over HPA: Here, KEDA's strength lies in its ability to adapt to the number of unprocessed messages in the Azure Event Hub, ensuring real-time data processing. Traditional HPA scaling based on CPU or memory wouldn't be as responsive to these event-driven requirements.

3. Real-time Chat Application

Scenario: You manage a chat application where messages are temporarily stored in a RabbitMQ queue before being delivered to users.

Why KEDA Over HPA: KEDA excels in this scenario by dynamically adjusting resources based on the RabbitMQ queue length, ensuring prompt message delivery. This is a level of granularity that HPA, with its focus on system metrics, can't offer.

4. Stream Processing with Kafka

Scenario: Your application consumes messages from a Kafka topic, and the rate of incoming messages can fluctuate significantly.

Why KEDA Over HPA: In this case, KEDA's ability to scale based on the Kafka topic length allows it to adapt to varying loads effectively. HPA, which isn't designed for such custom metrics, wouldn't be as agile.

Why Choose KEDA in These Cases?

Event-Driven Flexibility: KEDA is tailored for scenarios where system metrics aren't the best indicators for scaling, offering a more nuanced approach.
Custom Metrics Support: Unlike HPA, KEDA can interpret a wide range of custom metrics, making it versatile for complex scaling needs.
Resource Optimization: KEDA's ability to scale down to zero pods when idle can lead to significant cost savings.
Adaptability: The platform's extensible design allows for custom scalers, making it adaptable to a wide range of use-cases.

Conclusion

So there you have it, folks! We've journeyed through the world of Kubernetes autoscaling, dissecting both HPA and KEDA to understand their strengths, limitations, and ideal use-cases. From my early days of being enamored with HPA's simplicity to discovering the event-driven magic of KEDA, it's been a ride full of lessons.

If you're dealing with predictable workloads and need a quick, straightforward solution, HPA is your reliable workhorse. It's like your trusty old hammer; it might not have all the bells and whistles, but it gets the job done efficiently.

On the flip side, if your application lives in the fast-paced realm of event-driven architectures or requires scaling based on custom metrics, KEDA is your Swiss Army knife. It's built for the complexities and nuances of modern, cloud-native applications.

Remember, choosing between HPA and KEDA isn't about which is better overall, but which is better for your specific needs. So take stock of your application's requirements, your team's expertise, and your long-term scaling strategy before making the call.

As you venture into your next Kubernetes project, I hope this guide serves as a useful roadmap for your autoscaling decisions. And hey, since you're all about diving deeper, maybe explore setting up these autoscaling strategies in a hands-on way. Trust me, there's no better teacher than experience.

Happy scaling!

Bonus Track: Meet VPA

While we've focused on HPA and KEDA, let's not forget about the Vertical Pod Autoscaler (VPA). Unlike HPA and KEDA, which scale the number of pod replicas, VPA adjusts the CPU and memory resources for your existing pods. Think of it as making your pods beefier or leaner based on their actual needs.

Why Consider VPA?

Resource Optimization: VPA fine-tunes the CPU and memory allocated to each pod, helping you use cluster resources more efficiently.
Complementary: VPA can work alongside HPA or KEDA, offering another layer of autoscaling. While HPA and KEDA scale out, VPA scales up.
Stateful Apps: For applications that can't be easily scaled horizontally, like stateful services, VPA can be a better fit.

So, as you ponder your autoscaling strategy, keep VPA in your back pocket. It offers a different angle on scalability that might just be what your project needs.

Top comments (2)

Pablo Gauna • May 7

Saul can you adjust the number of pods based on uptime and downtime schedules with keda. For example 10 first day of a month scaleup
thanks

Noam Shvacher • Dec 3

Saul, nowadays HPA also supports autoscaling based on custom metrics such as: pubsub.googleapis.com|subscription|num_undelivered_messages (in google pubsub)

DEV Community