DEV Community

Kenta Takeuchi
Kenta Takeuchi

Posted on • Originally published at bmf-tech.com

SLIs・SLOs・SLAs

This article is a translation of SLI・SLO・SLAについて.

About SLIs, SLOs, and SLAs

I will summarize what I have investigated about SLI, SLO, and SLA.

What are SLOs, SLIs, and SLAs?

SLO, SLI, and SLA are indicators, targets, and agreements related to service levels.
A service level is a specific measure of service provided over a period of time.

  • SLI (Service Level Indicator)
    • Service level indicators
  • Indicators, metrics to measure service levels
  • ex. availability, latency, error rate, throughput
  • SLO (Serivce Level Objective)
    • service level targets
    • Target quantitative or qualitative value of service level
    • Consider external dependencies
      • Communication with external services, externally linked parts such as SLO of managed services, etc.
  • SLAs (Service Level Agreements)
    • Service level agreement
    • Service level agreements and guarantees between service providers and users
    • It is better to set the target value looser than SLO

How to set SLI/SLO

I think it's good that the best practices advocated by NewRelic are easy to work with.

newrelic.com - Best practices for setting SLI/SLO in modern systems

It introduces how to formulate SLI/SLO by defining system boundaries, defining functions for each boundary, defining availability for each function, and defining SLI for availability measurement.

When starting the operation of SLI/SLO, it is recommended to start operation with loose values ​​as simple as possible.

cf. sre.google - Chapter 4 - Service Level Objectives

When I actually formulated SLI/SLO for my business, I followed this NewRelic practice, but I adjusted the functional units so that they were not too detailed.

If the unit of function is made finer from the beginning, the operation will become difficult, so I think it is better to adjust the granularity as necessary during operation.

Tips

Tips for keywords related to SLI/SLO.

The difference between reliability and availability

  • reliability
    • A characteristic of a system that is the degree of tolerance to failure
  • Availability
    • Degree to which the system can continue to operate

List of uptime and downtime, availability calculation

Availability Annual Downtime Monthly Downtime
99.0% 87.6 hours 7.6 hours
99.5% 43.8 hours 3.65 hours
99.9% 8.76 hours 43.8 minutes
99.95% 4.38 hours 21.9 minutes
99.99% 52.56 seconds 4.38 minutes
99.999% 5.256 seconds 26.28 seconds
99.9999% 31.536 seconds 2.628 seconds

What is an error budget?

A budget for error, a measure of acceptable reliability calculated relative to an SLO.
ex. SLO 99.99% → error budget 0.01% or less

Impression

By making the service level measurable, it becomes possible to observe whether service users (users or systems) are able to provide services satisfactorily, and for service providers, it becomes an indicator of whether improvement of the service level is necessary. I thought I'd get

Reference

Top comments (0)