The SRE Guide to Hyperscale for Cloud-Native Applications

#sre #devops #arm #apm

In my previous post, I discussed the advantages of using Instana Enterprise Observability for achieving hyper-resiliency for applications, particularly cloud-native applications. Hyper-resiliency is usually defined as 99.99% system and application availability, or four 9s. Essentially, it is the ability to perform non-stop computing.

In the cloud, high availability can be difficult, even with the ubiquitous use of cluster technology. Meanwhile, hyperscale for cloud-native applications occurs when infrastructure resources are properly allocated to applications as they scale. If resources are mis-allocated, especially if they’re under-allocated, application performance can degrade or even stop.

Instana Enterprise Observability helps keep applications available by notifying app teams when problems begin. Granular metrics, events, and traces with context enable teams to rapidly identify issues.

If the availability or performance issues are caused by under-allocated or unbalanced resources (CPU, memory, network, and storage), Instana can pass that data to Turbonomic, another IBM company. Turbonomic provides Application Resource Management (ARM), which automatically and dynamically manages and allocates infrastructure resources for applications.

Combining Turbonomic ARM with Instana Enterprise Observability keeps application resource allocation optimized to ensure Service Level Objectives for both performance and availability. ARM procedures can be fully automated or partially automated to enable server resource adjustments that enhance application resiliency and performance, and optimize resource allocation cost.

How ARM and observability work together

Instana monitors application metrics, events, traces, and logs to provide a rich mosaic of application health information. It captures these measurements at unmatched one-second intervals. At this frequency, Instana can observe and identify any issues, either application or infrastructure, and match them with upstream and downstream dependencies in real time.

One-second monitoring granularity is one of the most critical attributes for hyper-resiliency because longer sample times of 10 seconds or higher are not adequate for detecting anomalies. Events in microservice applications and the surrounding infrastructure take place in microseconds, meaning that they can go undetected for a long time with sampling.

Instana’s Enterprise Observability powers rapid anomaly recognition so Turbonomic can apply problem remediation to provide the strongest SLO compliance. If it’s a code issue, Instana’s Auto Profiler identifies the problematic code within a few clicks.

The combination of Instana + Turbonomic creates a seamless and automatic remediation path for any issues that are attributable to mismatched application resources.

For cloud-native applications, those mismatches happen frequently. One moment your applications are starved for resources due to a sudden surge in activity; moments later, they’re over-allocated as the demand surge drops.

When application infrastructure resources are low for any microservice, performance degrades – or worse, service crashes. Instana identifies the slow application response time, highlights constrained resources that may be the root cause of the disruption, and passes that data to Turbonomic.

Turbonomic knows exactly why the resources are constrained and the right adjustment to remediate the disruption. These actions are illustrated in the diagram below, which highlights how Turbonomic adjusts constrained resources based on a target response time.

Proper resource allocation is critical

Turbonomic acts when resources are under-allocated to make sure that performance degradation (or worse) does not occur. Turbonomic automatically adjusts application resources to avoid resource contention or under-allocation that can negatively impact SLOs.

Conversely, when resources are over-allocated, Turbonomic automatically makes adjustments based on thresholds you define. This helps dramatically reduce cloud overspend, which is equally problematic

Instana + Turbonomic is a power combo that will rapidly become an SRE’s best friend. The combination enables hyperscale with hyper-resiliency, cost effectively. It paves the path to automated SLO compliance and continuous performance consistency, especially for your cloud-native applications.

Try out Instana with a guided tour in our Play With environment.

The post The SRE Guide to Hyperscale for Cloud-Native Applications appeared first on Instana.