How to Implement Autoscaling in the Cloud: Practical Experience and Lessons from AutoMQ

Background
Elasticity serves as the bedrock of cloud-native and Serverless architectures. From its inception, AutoMQ has prioritized elasticity as a fundamental aspect of its offering. In contrast, Apache Kafka was developed during the data center era and tailored for physical hardware setups, relying heavily on local storage—a design less adaptable to today's cloud-centric environments. Yet, this does not imply Kafka should be discarded. Thanks to its robust ecosystem, Kafka has cemented a formidable position in the stream processing domain, with the Kafka API emerging as the standard protocol for stream processing. In light of this, AutoMQ has enthusiastically adopted the Kafka ecosystem. While maintaining compatibility with the computational aspects of Kafka, AutoMQ has adapted its underlying storage architecture to be cloud-native, thus maximizing the cloud's scalability, cost efficiency, and technological advances.

AutoMQ employs object storage and cloud disks to construct a core that facilitates rapid elasticity, thereby enabling automatic elasticity (hereinafter referred to as Autoscaling) within the cloud. This article will explore AutoMQ's implementation of Autoscaling in cloud environments and share insights and lessons learned from the endeavor.

What AutoMQ Aims for in Autoscaling
In streaming systems, the essence of Autoscaling lies in the system's capacity to dynamically scale its resources in response to fluctuating write workloads. As the write traffic increases, the cluster can swiftly expand to manage the increased demand; conversely, when the write traffic diminishes or even ceases, the cluster can contract, reducing resource expenditures and potentially scaling down to zero, thereby utilizing no resources whatsoever.

We believe that products with optimal autoscaling capabilities must possess the following characteristics:

Built on public clouds or on sizable private clouds: The essence of cloud technology lies in the integration and reuse of resources, which yields technological and cost benefits. Public clouds, operating at the largest scale, offer the most significant advantages. The utility of autoscaling is in its ability to rapidly release resources when they are no longer needed, thus avoiding unnecessary expenses; and to quickly access reserved resources from the resource pool when needed again. Here, the vast scale of public clouds provides the greatest benefit. Although private clouds can achieve similar results, a 10% reserved capacity might equate to 100 machines in a private cloud, but on AWS, it could be 10,000 machines, highlighting the difference in scalability. Tips: Currently and going forward, there will still be scenarios that necessitate deployments in non-cloud environments. However, given recent trends like the rise of Kubernetes, it is expected that the technical foundations of private infrastructures will increasingly align with those of public clouds. Private environments can also offer similar functionalities as cloud disks (openebs) and object storage (minio).
Capable of fully leveraging cloud services: The core philosophy of AutoMQ is to utilize mature, scalable, and technically superior cloud services to develop its leading product capabilities. Regarding elasticity, after thorough research in multi-cloud environments, we observed that the elasticity of compute instance groups (also known as node groups) has become a standard feature. Thus, AutoMQ maximizes the use of cloud-based elastic scaling group services to facilitate the rapid deployment of production-level elastic capabilities. Tips: As elastic scaling groups and their associated capabilities are becoming standardized across various clouds, the subsequent explanation will focus on AWS cloud services as an example.

From a technical perspective, the Autoscaling that AutoMQ pursues is:

Fast Scaling: Here at "fast scaling," we primarily focus on the process of scaling out. In production environments, we usually adhere to the best practice of "scale out fast, scale in slow" to ensure a smooth autoscaling experience for the business. The quicker the AutoMQ cluster responds to a sudden surge in write traffic and completes the scaling out process until the final write throughput meets the target, the more efficient the scaling is considered.
Precise Scaling: Precise scaling has two primary interpretations. First, the capacity adjustment should stabilize at the desired target as swiftly as possible, avoiding any fluctuations due to the settings of the elastic strategy. Secondly, the target capacity of the scaling should align precisely with the actual demand to prevent both over-scaling, which can lead to resource waste, and under-scaling, which can impact message end-to-end latency.
Cost-efficient Scaling: Autoscaling largely depends on monitoring data to determine appropriate times to scale out or in. The storage, management, and application of metrics all involve additional costs.

Autoscaling Technology Architecture
Leveraging cloud capabilities simplifies AutoMQ's autoscaling architecture significantly. It includes the following components:

Auto Scaling Group (abbreviated as ASG): AWS provides the Auto Scaling Group, which organizes EC2 computing instances into logical groups. It manages capacity at the group level and includes additional features such as machine monitoring, elasticity, and lifecycle hooks. This service is available at no cost across various cloud platforms.
Cloud Watch: AWS cloud monitoring can set up monitoring and alerts to initiate capacity adjustments in ASG. AWS offers complimentary machine monitoring for EC2 (with a granularity of 5 minutes). In scenarios where the demand for rapid elasticity is low, this free service provided by cloud platforms can be fully leveraged to minimize costs.
AutoMQ Control Panel: The control panel of AutoMQ, tasked with interfacing with the cloud's API, creates ASG elastic policies, and connects the alarm modules in Cloud Watch to the ASG's elastic policies. This integration ensures that reaching alarm thresholds can trigger adjustments in ASG's capacity. For ASG, linking elastic policies with the appropriate metric thresholds automates the capacity adjustment process once thresholds are met.

Challenges of Autoscaling in the Cloud
Understanding the characteristics and combined effects of various elasticity strategies offered by cloud providers
Cloud providers typically provide a range of standardized elasticity strategies that enable AutoMQ to quickly develop its own autoscaling capabilities. However, our experience has shown that the implementation is not always straightforward. Without a deep understanding of these strategies, there is a risk of misapplying them, which can lead to not achieving the intended results.

Here, we provide insights into how several elasticity strategies from AWS ASG (similar to those of other cloud providers) are applied by AutoMQ.
Simple Strategy
The Simple Strategy[1] activates based on metric-based alerts. When an alert is triggered, possible actions include scaling the number of compute instances up or down by x. The primary advantage of this strategy is its simplicity; however, it lacks the flexibility to dynamically fine-tune various steps based on different scenarios. Additionally, it's crucial to recognize that simple scaling requires a wait for either the scaling operation or a health check replacement to complete, and for the cooldown period to expire, before it can react to further alerts. The cooldown period is designed to prevent the initiation of additional scaling activities before the effects of the previous one are fully realized.
Elastic Policy Step Size: When an elastic policy is activated, necessitating an increase or decrease by x instances, x denotes the step size.
Cooldown Period: This period is the time required to wait after a previous scaling operation has completed. It's designed to allow the application to stabilize post-scaling before further capacity adjustments are made, thereby smoothing the scaling transitions and minimizing impact on the application.
Step Scaling Policy
The Step Scaling Policy[1] can be viewed as an advanced version of a basic strategy, permitting different step sizes based on varying monitoring thresholds. For instance, if the CPU utilization is between 75%-85%, add 2 instances; if between 85%-95%, add 3 instances; and if over 95%, add 4 instances. This method offers more nuanced control over capacity adjustments, helping to prevent both over and under scaling.

Target Tracking Policy
The main objective is to optimize capacity usage to prevent resource wastage. The Target Tracking Policy[2] achieves this by setting a target, such as CPU utilization, and allowing AWS to adjust the number of instances to be added or removed, with the step size being user-definable. What does it mean to maintain a value close to the target? AWS generally employs a capacity-first approach. For example, if a target CPU utilization of 50% is set and the Auto Scaling group exceeds this, adding 1.5 instances might approximate the CPU utilization back to 50%. Since adding 1.5 instances isn't practical, rounding up to two instances is the next best option. This adjustment might push the CPU utilization slightly below 50%, but it ensures that the application has ample resources. Conversely, if removing 1.5 instances would push the CPU utilization above 50%, only one instance would be removed.
When AutoMQ first adopted the Target Tracking Policy, the goal was to dynamically adjust the step size to reach the target capacity more accurately and quickly. However, it was found to be less effective than anticipated. In reality, integrating simple strategies often offers more flexibility than the Target Tracking for Policy, which does not permit customizing the step size adjustments.

Pre-test Expansion
Applicable for periodic loads (requiring at least 24 hours of data), AWS will utilize machine learning to best fit the load. This can be executed in conjunction with other scaling strategies. AutoMQ did not initially attempt this elasticity strategy. On one hand, AutoMQ, as a general stream processing system, is not only used in periodic load scenarios, and on the other hand, we cannot predict what kind of workload users will adopt.
Plan to Expand
Essentially, it's about scheduled scaling, where you can set up timed tasks to adjust capacity, which is suitable for scenarios like major promotions where there is a clear awareness of the target capacity.

How do multiple elastic policies work in the event of a conflict?
Different cloud vendors have varying methods for handling conflicts between elasticity policies. Proper use of these policies requires a thorough understanding of how they behave during conflicts. For instance, on Alibaba Cloud, when there is a conflict between elasticity policies, the results of the two policies are cumulatively applied. For example, if one policy calls for a scale-out of four instances, and another calls for a scale-in of two, the final result would be a scale-out of two instances. However, AWS's approach to elasticity policies primarily prioritizes maintaining capacity to ensure availability. When multiple elasticity policies conflict, AWS prioritizes the execution of the policy that results in a larger capacity.

Seeking the golden metrics for triggering elastic execution
Elastic policies are merely logical execution plans. Deciding when to trigger the execution of these policies is a crucial challenge in practice. The triggering conditions for the execution of elastic policies are based on monitored data. Identifying a golden metric that triggers elasticity accurately is key. However, in real-world production applications, factors such as deployment models and workload can affect the choice of this golden metric.

Ideally, we hope the application kernel can provide a golden metric. Any external environment bottlenecks, such as high CPU Load or network traffic congestion, can ultimately be reflected in this unique golden metric. Unfortunately, Kafka itself does not provide such a metric at the kernel side. Currently, AutoMQ determines the timing of automatic elasticity based on network traffic. According to our judgment, the golden metric for elasticity cannot be a single metric, but a composite metric combining multiple factors and weights. Key factors can include the network uplink and downlink traffic of broker machines, CPU usage, memory usage, disk IOPS and bandwidth, etc. The weight of these factors will vary under different loads and hardware environments. The ideal situation in the future is for AutoMQ to provide a default multi-factor metric to guide the triggering of elasticity, and users can also customize the factors and weights involved in the composite metric.

AutoMQ's final application of elasticity policies
Scheduled Elasticity
The core of AutoMQ's elasticity strategy is a target-tracking strategy based on simple rules, augmented by an optional scheduled elasticity policy. The default target-tracking strategy utilizes moderate scaling steps to ensure a smooth application of elasticity and to minimize resource waste. However, in scenarios like e-commerce promotions or food delivery services, where traffic spikes occur during specific periods, relying solely on the default elasticity policy may prove inadequate. Thus, integrating an optional scheduled elasticity policy is essential for effective elasticity management in production environments. Scheduled elasticity involves proactive capacity planning by humans—a heuristic approach—where the cluster automatically downscales to a predetermined capacity post-peak traffic periods. The scheduled elasticity policy leverages cloud infrastructure capabilities, setting execution times and target capacities based on cron expressions. For instance, the scheduled elasticity strategy below is well-suited for the food service industry, scaling up at 11 AM to a specified capacity of 20 and then scaling down at 2 PM to a lower target capacity.

Custom Target Tracking Strategy
AutoMQ has developed a custom target tracking strategy founded on a straightforward policy. This strategy, now the default, is activated by network traffic and meets the demands of most standard scenarios. Offering more flexibility than typical cloud default target tracking policies, it allows for swift scaling up and gradual scaling down, enhancing the robustness of the elasticity effect in real-world applications. The custom target tracking strategy employs a simple policy for scaling up and another for scaling down.

In the custom target tracking strategy, the step sizes for scaling up and down are proportionally adjusted, ensuring uniform scaling efficiency across varying cluster sizes. The elasticity policies displayed on AWS ASG are as follows.

Since most clouds already provide default metrics collection, AutoMQ's default elasticity strategy does not necessitate independent metrics collection and management. Leveraging these cloud capabilities can significantly simplify our implementation. Let's first define the variables involved in the elasticity strategy expressions:

network-in bytes (nin): the cumulative number of bytes of network traffic incoming during each metric reporting interval.
network-in bytes per second (nins): AWS calculates the bytes per second using the formula nins = nin / DIFF_TIME(nin), which determines the rate of network inbound bytes per second.
network-out (nout): Cumulative network outbound bytes during each metric reporting interval.
network-out bytes per second (nouts): AWS calculates the bytes per second using the formula nouts = nout / DIFF_TIME(nout), which determines the rate of network outbound bytes per second.
active instance count in ASG (acount): Number of active instances in an ASG, with AWS typically aggregating metrics for the group, necessitating a division by the number of broker machines in the ASG to calculate the traffic per broker.
upper: Network traffic threshold for scaling up, generally set at 80% of the instance type's network bandwidth cap, though this value can be customized by users.
lower: Network traffic threshold for scaling down, typically set at 50% of the instance type's network bandwidth minimum, with the option for user customization.

Simple scaling strategy for expansion is as follows, meaning: If the average network traffic of inbound or outbound per broker exceeds the set average bandwidth, then scale up by our set step (default is 10% of current capacity and at least one instance). It is essential to note that for computing instances provided by cloud providers, if the network bandwidth is assumed to be 100MB/s, it would imply 100MB/s each for both inbound and outbound.
max(nins/acount,nouts/acount) > upper

The simple elastic strategy for scaling down is as follows, meaning: Scale down only when the following three conditions are met:

The minimum number of live brokers must be at least 1; it is not allowed to scale down to zero.
A scale-down is permitted only when the average network traffic of incoming or outgoing brokers is below the set threshold limit.
The third part essentially assumes a reduction of one broker from the current count, then calculates the value according to the scaling-up formula, ensuring it remains below our set upper threshold. This approach primarily aims to prevent the behavior of scaling down and then immediately scaling up in small-scale clusters, where frequent scaling activities can significantly impact the cluster. acount>1 && ( max(nins/acount,nouts/acount) < lower ) && ( max(nins/acount-1,nouts/acount-1) < upper ) AutoMQ Elasticity Effect Display The figure below shows the relationship between cluster size and network traffic under a varying load in AutoMQ, demonstrating how well the broker count adapts to changes in traffic, achieving effective automatic elasticity. For frequently varying loads, enabling automatic elasticity can significantly save costs, achieving a pay-as-you-go effect. For specific experimental tests, please refer to our cost report [4].

Looking Towards the Future of AutoMQ Autoscaling
The current automatic elasticity capabilities still have many areas that could be optimized, including:

More effective elasticity strategy trigger gold standards: providing a default set of metrics for elasticity strategies and their accompanying product capabilities. The default metrics set allows the elasticity strategies to adapt to a wider range of scenarios. Providing product capabilities enables users to flexibly adjust the composition and weight of metrics according to their specific scenarios, thus achieving more precise elasticity effects.
Multi-cloud Auto-Scaling Adaptation: Currently, some cloud platforms still lack support for automatic scaling. There are significant variations in cloud monitoring, alerting, and automatic machine monitoring collection capabilities across different Cloud providers. Enhancing auto-scaling to accommodate more clouds is crucial for developing a robust multi-cloud auto-scaling framework.
Custom Monitoring Collection and Reporting: During our implementation, we've observed disparities in the monitoring capabilities and SLAs offered by various Cloud providers. In stringent scenarios, the default monitoring collection and reporting mechanisms provided by cloud vendors may prove inadequate. For instance, AWS's default machine monitoring operates at one-minute intervals. For more immediate scaling requirements, adopting a model where AutoMQ independently collects and reports monitoring data is essential. This method allows for more flexible and controllable monitoring data, and it also facilitates ongoing optimization of monitoring metrics collection and storage, ultimately reducing infrastructure costs.
Auto-Scaling on K8S: In the context of k8s, we have begun experimenting with AutoScaler [5]. As a major player in the current cloud-native landscape, k8s boasts a substantial user base. AutoMQ is committed to staying current, enabling users to leverage AutoMQ's auto-scaling capabilities on k8s platforms as well.

References
[1] Step and simple scaling policies for Amazon EC2 Auto Scaling: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-simple-step.html
[2] Target tracking scaling policies for Amazon EC2 Auto Scaling: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html
[3] Basic monitoring and detailed monitoring: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-metrics-basic-detailed.html
[4] AutoMQ Cost Analysis Report: https://docs.automq.com/zh/docs/automq-s3kafka/EJBvwM3dNic6uYkZAWwc7nmrnae
[5] AutoScaler: https://github.com/kubernetes/autoscaler

DEV Community

How to Implement Autoscaling in the Cloud: Practical Experience and Lessons from AutoMQ

Top comments (0)

Read next

Backend Development Roadmap

Amazon Lightsail: Instances, Access, and Best Practices

Game Dev Digest — Issue #262 - VFX, XR, Replay Systems, and More

8 tips to learn GenAI in 2025