DEV Community

Cover image for Why SLIs Matter More Than SLOs
Samson Tanimawo
Samson Tanimawo

Posted on

Why SLIs Matter More Than SLOs

SLOs get all the attention. I want to argue that your SLIs are more important.

Here's the thing: an SLO is a number you pick. 99.9% uptime. 300ms p95 latency. Whatever. It's a decision.

An SLI is what you're actually measuring. Is it the right signal? Does it reflect user experience? Is it gamed by caching? Those questions matter more than the target.

Bad SLI, good SLO

'99.9% of healthcheck requests return 200.' Looks great on paper. Means nothing to users. Your healthcheck endpoint can be up while your actual API is broken.

Good SLI, any SLO

'99.x% of user-initiated checkout requests complete successfully within 5 seconds.' That signal tells you if the product works. Whatever target you pick, you're measuring the right thing.

The test

When your SLO is missed, does on-call get paged because a real user is suffering? If yes, your SLI is good. If no, you're measuring the wrong thing.

Pick the SLI first. Pick the SLO second. Most teams do it backwards and wonder why their reliability work doesn't move the needle.


Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

Top comments (0)