Alexis Roberson for LaunchDarkly

Posted on Dec 17, 2025 • Edited on Feb 10 • Originally published at launchdarkly.com

Day 7 | 🎄✨The Rockefeller tree in NYC: SLOs that actually drive decisions

#observability #slos #evaluations #flags

Originally published in the LaunchDarkly Docs

Most Subject Level Objectives or SLOs sit in dashboards gathering dust. Seeing that SLOS are performance targets that can be measured, they're extremely important in understanding the quality of a service or system. You define them, measure them, but when your conditions are not met, there’s no followup.

The biggest drawback is that SLOs are created to add value but if they’re never reinforced, it’s impossible to drive decisions, influence roadmaps, or help during incidents.

When it comes to defining SLOs, many folks often start at the top of the funnel by picking general metrics to measure, but in order to create SLOs that work, it’s important to understand how the roots impact the leaves.

In this post, we'll cover the pitfalls leading to out of sync SLOs with a few tips and tricks to ensure what you measure produces business value. You'll also see an example of how to set SLOs in realtime for a flag evaluation feature that you can implement in your own planning process.

But first, we'll explore a tree metaphor to recap key observability components and how they're influence expands from the roots all the way to the leaves. What if the popular Rockefeller tree in NYC represented the relationship between telemetry data and SLOs?

Understanding the Observability Tree

SLOs would essentially be the leaves on the Response branch as shown in the above image, meaning they are visible, measurable targets that everyone can see. The leaves would not be possible without the support of the trunk.

This is your telemetry data or the traces, logs, events you collect from your system. The data you collect acts as the foundation and support for the branches and leaves.

The roots represent the things you cannot see but are still vital to overall health of your system. This is the hard part of understanding system behavior, debugging unknown unknowns, and making data-driven decisions.

Most teams skip the roots entirely. They define SLOs using only the trunk (logs, traces, events), measuring things that can already be measured. However, the business outcomes and user behaviors are buried in what would be defined as the roots. Sometimes you have to dig through the soil to ensure your SLOs don't end up technically accurate yet strategically ineffective.

What Makes an SLO Decision-Worthy

So what makes a good SLO? The goal of a SLO is to bridge engineering and business needs to support a high quality user experience. And a good SLO depends on three things, business clarity or asking the right questions, measurability, or can these components actually be measured, and actionable targets, or the game plan for when things go wrong.

First, you need business clarity, which are the roots of the previously mentioned observability tree. This means articulating why something matters in concrete terms like dollars, users, retention and also avoiding vague statements like "uptime is important." For instance, if I were measuring the impact of downtime on a checkout feature, I could establish the SLO scope with “each minute of checkout downtime costs us $12,000 in lost revenue based on our average transaction volume.” It is essential to be able to explain the business impact in one clear sentence.

Second, you need measurability. This is like the trunk of the tree. Your SLO must connect to your golden signals such as latency, traffic, errors, saturation. This is where a lot of aspirational SLOs fall apart. Upper management might want to measure user happiness, but how can engineering translate this into actual metrics? Try to express the business impact in one clear sentence. If that's difficult, it’s usually a sign the problem definition needs a bit more shaping before defining the SLO.

Third, you need actionable targets, which represent the leaves on the observability tree. This is where most SLOs fail even when they get the first two right. There's a number, maybe even a threshold, but no clear action plan. What happens when you miss it? Who gets paged? What gets paused? Decision-worthy SLOs specify exactly what happens at different levels of degradation, and more importantly, they give everyone the confidence to make decisions based on those levels.

Building production resilient SLOs: LaunchDarkly’s Flag evaluation example

We can apply these same principles of building a production-worthy SLO using LaunchDarkly’s flag evaluation feature.

The flag evaluation feature in the monitoring tab is an extension of observability where it tracks how often each flag variation is served to different contexts over time, and highlights flag changes that might affect evaluation patterns.

Now, let’s build a SLO.

Step 1: Start with the business question

What would be impacted if the flag evaluations monitoring feature broke? Customers use these charts to understand rollout progress, debug targeting issues, and verify that their flags are working as expected. If evaluation data is delayed or missing, they can't trust what they're seeing. They might roll back a working feature thinking it's broken, or fail to catch a real problem because the charts show stale data. This undermines confidence in the platform and increases support load.

Step 2: Translate to user experience terms

What does "working well" look like? When a customer makes a flag change and checks the monitoring tab, they see updated evaluation counts within a couple minutes. The charts load quickly (under 3 seconds). The data is accurate meaning evaluation counts match what's actually happening in their application. If there's a delay, we tell them explicitly rather than showing stale data as if it's current.

Step 3: Connect to telemetry

We track several golden signals for this feature.

Data pipeline latency: time from evaluation event to appearing in charts.
Chart load time: how long it takes to render the monitoring page.
Data accuracy: comparing our recorded evaluations against a known sample.
Error rate: failed queries or chart rendering errors.

For the sake of this example will set arbitrary numbers for these signals. Let’s say you had a median pipeline latency of 45 seconds, p95 at 2 minutes, p99 at 5 minutes. And a chart load time averages 1.2 seconds. Data accuracy is 99.7 percent (some evaluations drop due to sampling) and error rate is 0.3 percent.

Using this data, we can set the target.

Step 4: Set the target

Based on that data, here's our SLO: 98 percent of flag evaluation events will appear in monitoring charts within 3 minutes, with chart load times under 3 seconds at p95.

Why these numbers? Customer research shows they expect "near real-time" monitoring, which they define as 2-3 minutes. Anything longer feels like stale data. Three seconds for chart loading is the threshold where users perceive delay and start questioning if something's broken.

We chose 98 percent instead of 99.9 percent because some evaluation events get sampled out intentionally for cost reasons, and occasional data pipeline delays from third-party dependencies are acceptable.

Now that we have our targets, we can use those thresholds to set conditional responses based on alerts or indicators.

Step 5: Define operational responses

Responses for Green, Red, or Yellow indicators in production:

If Green (>98%, <3 min, <3 sec load), continue normal operations.
If Yellow (95-98%, or 3-5 min, or 3-5 sec load), alert on-call, investigate within 4 hours.
If Red (<95%, or >5 min, or >5 sec load), page immediately, update status page if widespread.

Step 6: Drive decisions

Now the SLO becomes your decision-making framework. When engineering proposes adding a new feature like "evaluations by SDK" breakdown, the first question is: "Will this keep us within our 3-second chart load SLO?" If the answer is no, we either optimize the implementation or push back on the feature.

Infrastructure changes get evaluated the same way. Before migrating the data pipeline to a new system, we load tests against both our latency and accuracy targets. If the migration risks our SLO, we either fix the architecture or delay the migration. Another way I've seen SLOs used is planning future work. ie. if a team knows they are in the yellow this month, they may avoid picking up other risky work.

The SLO transforms from a monitoring target into a decision filter, helping to determine what gets shipped and what doesn’t.

Bringing it all together

Great SLOs aren't just leaves you pluck and add to dashboards. They're connected to everything below them from the trunk of solid telemetry to the roots of understanding what actually matters to your business and users. If you skip those foundational layers, your SLOs become technically accurate but strategically useless.

Start with the roots. Ask what would be impacted if this feature were to break. Work your way up through user experience and technical measurement. Build SLOs that bridge engineering and business with clear thresholds and clear consequences. And finally, make them specific enough to drive real decisions.

DEV Community