DEV Community

Prasad Dorbala
Prasad Dorbala

Posted on

Why We Built Smart Scaler

In the rapidly evolving world of cloud computing, managing resource scalability in response to service demand has emerged as a critical challenge. To address this challenge, we developed Smart Scaler, a tool designed to automate infrastructure and application resource scaling. By predicting service demand in advance, Smart Scaler ensures that resources are precisely aligned with needs, optimizing both performance and cost.

The idea for Smart Scaler came out of my experience running SaaS platforms. Over the years, I’ve collaborated closely with various teams to ensure systems function properly. I’ve also led a Cloud Ops team whose primary responsibility was to ensure that systems consistently met the service-level objectives while keeping the cost of the cloud services aligned with the cost of sales. However, this presented a significant friction point in current operating models.

Operation teams struggle to balance service assurance with cost control, impeding the team’s ability to sustain the development velocity required by the business.

Teams need to deploy different versions of microservices swiftly to maintain the momentum of developing new or improved features. The complexity of managing interactions between services from various teams adds another layer of difficulty. Even though most deployments follow a structured pipeline to reaching production, from development to pre-production, performance engineering, and finally to production, replicating real-world performance standards remains a daunting task. Companies need to be able to maintain performance standards that mirror real-world scenarios, especially with the introduction of new or enhanced APIs. This puts immense pressure on the performance engineering team to continuously update production data and ensure that the performance stack remains robust.

Scaling Is a Complex Balancing Act

Navigating the balance between cost efficiency and service reliability presents a significant challenge for site reliability engineering (SRE) teams. Corporations often get rewarded for delivering top-performing services to end customers, while corporate executives prioritize improving the profit margins of those services through cost-saving strategies. It’s vital for companies to find an equilibrium between those two objectives.

Employee wellness is also a key indicator of team health and productivity, underscoring the importance of achieving balance not only for optimal service performance but also for fostering a productive and healthy work environment. As a result, scaling resources can be a difficult balancing act that requires balancing three outcomes: customers must receive high-quality services, financial loss should be reduced, and employees must have a productive and healthy work environment. Automating the scaling process can help ensure that companies are deploying the correct amount of resources while limiting the need for employees to micromanage the scaling aspect of the deployment process.

Technical Challenges in Scaling

When determining how to best scale an application, I often ask myself whether application behavior can be totally modeled by infrastructure metrics like CPU and memory. It’s a question that’s frequently overlooked, but it’s increasingly relevant in today's diverse development environments, where teams choose appropriate programming languages based on their specific business needs. Each programming language has different needs in terms of memory requirements, so relying solely on infrastructure metrics like CPU or memory to model complex application behaviors is frequently inadequate.

The Kubernetes Horizontal Pod Autoscaler (HPA) attempts to mitigate scaling challenges by allowing the inclusion of custom metrics through API calls. However, it overlooks crucial aspects such as the temporal dynamics of these metrics and the intricate network of service dependencies. This singular approach to metrics, treating each as isolated to a specific service, fails to account for the interconnectedness of services. Additionally, HPA's reliance on infrastructure metrics does not capture the full picture, as it does not consider the nuanced behaviors of applications hosted within the pods, including the programming languages and specific application behaviors.

The diversity in application statistics across various deployments adds another layer of complexity to scalability. While solutions like Isto attempt to standardize metrics, these efforts are not yet integrated into scaling decisions. Additionally, Istio's adoption is not universal, partly due to the operational hurdles posed by its sidecar deployment model and management difficulties. Critical information on service failures is often buried in application logs, which cannot be easily modeled within scaling solutions.

The application context is pivotal to scaling pods for service assurance. Metrics such as queue depths, latency between services, API error rate, and requests per second (RPS) on APIs, which serve as the backbone of microservices, should be factored into scaling decisions. In the microservices landscape, understanding service chains and traffic proportionality is fundamental to effective scaling strategies.

There are also projects like Keda that aim to allow developers to define what factors trigger scaling, but teams still need to manually set up scaling trigger points. This approach still misses the mark on understanding the end-to-end service chain, showcasing the clear limitations of infrastructure metrics in ensuring service reliability.

How Smart Scaler Improves the Scaling Process

Smart Scaler leverages advanced machine learning and reinforcement learning techniques to automate the scaling process, making it both efficient and cost-effective. Machine learning stands out in its ability to analyze application behaviors, including crucial metrics such as request rates per second on APIs, error rates on APIs, service-to-service latency, service chain graphs, and CPU and memory usage. Machine learning can process this vast amount of data efficiently and derive meaningful insights from it.

Unlike traditional analytic methods, machine learning models excel at analyzing diverse datasets and synthesizing outcomes that are tailored to specific objectives. They also automate such tasks with varying datasets which will lead to simplifying the day-to-day operations for individuals and organizations.

Smart Scaler also incorporates reinforcement learning, which offers a dynamic approach to data analysis without the need for constant retraining. Combining predictive modeling with reinforcement learning makes it even more powerful. Predictive modeling helps to estimate future or unseen data based on patterns learned from the environment. This predictive approach is key in alleviating challenges faced in Kubernetes environments when cluster capacity is at the edge and one needs more nodes to be added to the cluster during a traffic surge.

Smart Scaler can also help define policies of which type of node to bring into the cluster based on the application deployment manifest, scaling needs, and external policies like cost-saving initiatives. Predictive models help preposition infrastructure before the need arises to service traffic surges. While scaling up is critical during traffic bursts, scaling down is also critical for cost containment.

Conclusion

Avesha’s Smart Scale integrates Reinforcement Learning and Predictive Modeling to dynamically adjust HPA parameters. This ensures optimal performance while minimizing costs, enhancing decision-making, and improving resource utilization. With Smart Scaler providing predictive scaling for a collection of services and taking into account the service chains, deployments are now in a position to automate their scaling process, allowing for more effective and cost-efficient cloud-native infrastructure management.

Top comments (0)