didiViking

Posted on May 30 • Originally published at Medium on May 11

Observability Has a Data Hoarding Problem

#sustainability #observability #softwaredevelopment #womenintech

Somewhere in Greece

At the end of 2025, I gave a talk at a local meetup in Spain about something that, at the time, still felt a bit niche: Green Observability.

Most conversations around sustainability in tech tend to focus on infrastructure efficiency, cloud cost optimization, or greener hardware choices. Observability rarely enters that discussion. It is usually treated as a purely operational concern — something essential for reliability, debugging, and incident response — but not something we associate with environmental impact.

My talk explored a simple but uncomfortable idea: observability itself has a footprint.

Every metric we collect, every log we retain, every trace we store, and every dashboard query we run consumes compute, storage, and energy somewhere. At small scale, this impact feels negligible. But at the scale modern cloud-native systems operate today, observability becomes infrastructure in its own right — and infrastructure has an environmental cost.

A few months later, I expanded this topic into a talk at FOSDEM, focusing specifically on the energy and carbon footprint of modern observability systems. What stood out to me after that talk was the reaction from peers in the observability space. SREs, platform engineers, and observability practitioners came up to continue the discussion, and many strongly agreed with the core message. The feedback was consistent: telemetry growth is rarely questioned, observability systems are becoming increasingly expensive to operate, and the industry has normalized a level of data collection that often exceeds what teams actually use.

That conversation made one thing very clear to me: this is not a theoretical concern anymore. It is already happening in production systems everywhere.

The Observability Sustainability Paradox

Modern software systems are complex, distributed, and highly dynamic. Observability — collecting metrics, logs, and traces — is essential for understanding these systems. Without it, operating large-scale infrastructure would be nearly impossible.

But the same mechanisms that make observability powerful also make it expensive.

High-cardinality metrics, verbose logging, long retention periods, frequent scraping intervals, and always-on tracing pipelines significantly increase storage and compute requirements. Kubernetes environments amplify this further by generating telemetry at every layer: clusters, nodes, containers, services, and applications.

This creates what I call the observability sustainability paradox : the more data we collect to gain insight, the more energy and resources we consume to store, process, and query that data. I talked about it for the first time at Scale23x conference, in a presentation I delivered for the Cloud Native Days LA track.

At first, this trade-off feels acceptable. Visibility is critical. But over time, many systems quietly drift into excess:

metrics that are never queried,
logs retained far beyond their usefulness,
traces stored “just in case,”
dashboards nobody maintains anymore,
and pipelines that exist simply because they were never re-evaluated.

Eventually, observability stops being just a tool for understanding systems and becomes a system of its own — with its own operational overhead and environmental footprint.

Applying Green Software Principles to Observability

This is where green software thinking becomes relevant.

The goal is not to reduce observability or sacrifice reliability. The goal is to design telemetry with intention — to treat observability systems as software that should also be efficient, not just functional.

In practice, this means shifting the default mindset from “collect everything” to “collect what is useful.”

It starts with simple but important questions:

Do we actually use this metric in decision-making?
Does this dashboard influence operational behavior?
Is this alert actionable, or just informational noise?
Can we reduce retention without losing critical insight?
Are we collecting data because it is valuable, or because it is easy?

These questions are often uncomfortable because they challenge default engineering habits. But they are necessary if we want sustainable systems.

The Green Software Foundation has been driving similar thinking in broader software design: minimize waste, optimize resource usage, and consider environmental impact as a first-class constraint. Observability systems should not be excluded from that conversation.

Lessons From Practice

From a practitioner’s perspective, the most interesting insight is that reducing telemetry does not reduce observability quality — it often improves it.

In multiple systems I’ve worked with or observed, teams saw better outcomes after deliberately reducing noise in their observability stack. Not by removing critical signals, but by removing unnecessary ones.

Common improvements included:

reducing high-cardinality metrics to meaningful dimensions only,
sampling traces instead of capturing every request,
shortening retention periods for non-critical data,
simplifying dashboards that had grown organically over time,
and removing alerts that did not lead to action.

The result was consistent: less noise, faster debugging, and clearer operational signals.

One of the most counterintuitive outcomes is that observability becomes more effective when it is more constrained. When everything is monitored, nothing stands out. When telemetry is intentional, anomalies become easier to detect.

Technical Strategies for Sustainable Observability

At the technical level, sustainable observability is about rethinking the full telemetry lifecycle: generation, collection, storage, and querying.

A few practical approaches include:

Metrics should prioritize signal over dimensional explosion. High-cardinality labels should be used carefully, not by default. Logs should be structured but not overly verbose, and retention policies should reflect actual usage patterns rather than theoretical requirements.

Tracing benefits significantly from sampling strategies. Capturing every request is rarely necessary at scale, and intelligent sampling can preserve visibility into system behavior while dramatically reducing overhead.

In Kubernetes environments, there are additional optimization opportunities:

tuning scraping intervals to match actual needs,
avoiding duplicate instrumentation across layers,
aggregating metrics closer to ingestion,
reducing redundant exporters and sidecars,
and optimizing collector pipelines for efficiency.

Another emerging pattern is shifting some aggregation earlier in the pipeline, reducing the volume of data that needs to be stored or queried later. These kinds of architectural decisions often have outsized impact on both performance and cost.

Importantly, these optimizations do not just reduce resource consumption — they usually improve usability. Cleaner dashboards, faster queries, and reduced alert fatigue tend to follow naturally.

Open Source Tools Driving Change

Open source ecosystems are increasingly important in making sustainable observability practical.

Projects like OpenTelemetry provide the foundation for standardized telemetry generation and allow teams to implement sampling, filtering, and aggregation strategies consistently across systems.

At the infrastructure level, tools like Kepler are starting to make energy consumption measurable in cloud-native environments. This is a critical step, because what cannot be measured is rarely optimized.

These tools make it possible to connect observability decisions directly to resource and energy impact, rather than treating sustainability as an abstract concept.

However, most of the ecosystem is still evolving toward this awareness. Many default configurations still favor high-volume telemetry collection, and optimization is often left entirely to individual teams.

Benefits Beyond Sustainability

While environmental impact is an important motivation, sustainable observability also delivers very practical engineering benefits.

Reducing telemetry noise improves system clarity. Engineers spend less time filtering irrelevant signals and more time focusing on meaningful ones. Incident response becomes faster because dashboards are simpler and more relevant. Alert fatigue decreases because signals are more intentional.

Operational costs also drop — not just in storage, but in compute, query performance, and pipeline complexity.

In many cases, sustainability and operational excellence reinforce each other rather than conflict. Efficient observability systems tend to be easier to maintain, debug, and scale.

Culture, Measurement, and Accountability

Technology alone is not enough. Sustainable observability also requires cultural change.

Teams need to treat observability systems as first-class software systems that deserve regular review and optimization. Telemetry should not accumulate indefinitely without ownership or evaluation.

One of the most useful shifts is introducing visibility into observability itself:

How much telemetry are we generating?
What is actually being queried?
Which dashboards are still relevant?
Which metrics are never used?
What is the cost of our monitoring stack?

Even simple awareness of these questions changes behavior over time.

Sustainability also requires challenging assumptions. Many systems grow telemetry by default rather than by design. Revisiting those defaults regularly is essential to avoid long-term accumulation of waste.

The Role of the Community

The observability community has a key role to play in making this shift real.

Too often, the focus is on scale and volume rather than efficiency and signal quality. Instrumentation tends to grow faster than governance. And while tools exist to help manage telemetry, best practices around reducing waste are still not consistently applied.

The conversations I had after speaking at FOSDEM reinforced this. Many practitioners are already thinking in this direction, but there is still a gap between awareness and standard practice.

Community-driven efforts — whether through open source projects, shared patterns, or better defaults — can help close that gap. This includes encouraging smarter sampling strategies, better retention defaults, and more explicit thinking about the cost of telemetry.

If observability is becoming a core part of system architecture (which it is), then its environmental and operational footprint should be part of the design conversation from the beginning.

Rethinking What We Actually Need to Observe

I still believe observability is one of the most important disciplines in modern software engineering.

The best observability systems I’ve seen are not the ones with the most metrics or the most dashboards. They are the ones where every signal exists for a reason, where telemetry is intentional, and where engineers actively understand what they are collecting — and why.

Sustainable observability is not about collecting less for the sake of it. It is about collecting better. It is about ensuring that what we observe is meaningful enough to justify its cost.

Because in the end, the goal was never to build the biggest telemetry pipeline. The goal was to understand our systems well enough to operate them confidently and responsibly.

You can find directly my previous presentations on this topic and my entire talks’ portofolio at https://github.com/didiViking/Conferences_Talks.

Peace from Spain