DEV Community

Cover image for 🎯 SLI, SLO, SLA Explained 🎯
Shiva Charan
Shiva Charan

Posted on

🎯 SLI, SLO, SLA Explained 🎯

  • SRE is not about hope-based reliability.
  • It is about numbers, thresholds, and consequences.
  • At the core of SRE are SLIs, SLOs, and SLAs.
  • If you cannot measure it, you cannot make it reliable.

🎯 Big Picture (Mental Model)

  • 🟒 SLIs β†’ What we measure
  • 🟑 SLOs β†’ What we aim for
  • πŸ”΄ SLAs β†’ What we promise legally

SLIs feed SLAs, and SLAs define SLOs


πŸ“ Service Level Indicators (SLIs)

βœ” Raw measurements of service behavior

SLIs are quantitative metrics that describe how a service behaves from the user's perspective.

πŸ” Common SLI Types

SLI Type What It Measures
🟒 Availability Was the service reachable
⚑ Latency How fast responses are
❌ Error Rate How many requests failed
πŸ“¦ Throughput Requests per second
πŸ”„ Freshness Data staleness

βœ… Example SLIs (API Service)

Availability SLI = Successful requests / Total requests
Latency SLI = % of requests under 300ms
Error Rate SLI = 5xx responses / Total requests
Enter fullscreen mode Exit fullscreen mode

πŸ“Œ SLIs do not define targets.
They only provide truthful signals.


🎯 Service Level Objectives (SLOs)

βœ” Target reliability goals

SLOs define how reliable the service must be.

They are engineering targets, not legal contracts.


πŸ”’ Example SLOs

Availability SLO: 99.9% monthly uptime
Latency SLO: 95% of requests under 300ms
Error Rate SLO: Less than 0.1% failed requests
Enter fullscreen mode Exit fullscreen mode

πŸ“Œ SLOs are based on user expectations, not perfection.


πŸ”₯ Error Budget (Why SLOs Matter)

99.9% uptime = 43.2 minutes of downtime per month
Enter fullscreen mode Exit fullscreen mode

That downtime is your error budget.

If error budget exists If error budget is exhausted
πŸš€ Ship features πŸ›‘ Freeze releases
πŸ§ͺ Experiment πŸ”§ Focus on stability

This is SRE discipline in action.


πŸ“œ Service Level Agreements (SLAs)

βœ” Legal and business commitments

SLAs are contracts with customers.

They reference SLIs and define:

  • Acceptable performance
  • Measurement windows
  • Penalties or credits

🧾 Example SLA Clause

The service will maintain 99.5% monthly availability.
If availability falls below 99.5%, customers receive a 10% service credit.
Enter fullscreen mode Exit fullscreen mode

πŸ“Œ SLAs are intentionally less strict than SLOs.

Why?
Because breaking an SLA costs money and trust.


πŸ”— How SLIs, SLOs, and SLAs Connect

SLI β†’ Measured data
SLA β†’ Contractual minimums using SLIs
SLO β†’ Internal reliability target set above SLA
Enter fullscreen mode Exit fullscreen mode

🧠 Visual Flow

πŸ“Š SLIs (metrics)
      ↓
πŸ“œ SLAs (legal promises)
      ↓
🎯 SLOs (engineering goals)
Enter fullscreen mode Exit fullscreen mode

πŸ—οΈ Real-World Example (E-commerce App)

πŸ“Š SLIs

  • Availability: Successful HTTP responses
  • Latency: Request duration
  • Error Rate: 5xx responses

πŸ“œ SLA (Customer-Facing)

99.5% monthly availability
Enter fullscreen mode Exit fullscreen mode

🎯 SLO (Engineering Target)

99.9% monthly availability
95% requests < 250ms
Error rate < 0.1%
Enter fullscreen mode Exit fullscreen mode

Why higher than SLA?

βœ” Buffer for incidents
βœ” Protect customer trust
βœ” Avoid financial penalties


❌ Common Mistakes (Callout)

🚫 Setting SLOs without SLIs
🚫 100% uptime targets
🚫 SLAs tighter than SLOs
🚫 Measuring system metrics instead of user experience


βœ… SRE Golden Rules

  • Measure what users feel
  • Target less than perfect
  • Use error budgets to guide decisions
  • Protect engineers from endless firefighting

🏁 Final Takeaway

SLIs tell the truth
SLOs define reliability goals
SLAs define consequences

This trio is what turns reliability from wishful thinking into engineering discipline πŸ’ͺ


Top comments (0)