Datadog, Reframed: A Simple Way to Think About Agents, Pipelines, Indexes, and More

#datadog #observability #monitoring

This is Part 2 of 3 in the Filters series.

In Part 1, I introduced the Presentation Filter: how Datadog shows data back to users.

Datadog has a lot of products. Agents, tracing libraries, RUM SDKs, AWS integration, log pipelines, retention filters… the list goes on.

It’s not easy to remember how all of these pieces fit together.

That’s where the Filters idea comes in.

In my previous post, I introduced this mental model and zoomed in on the Presentation Filter.

Now, let’s move on to the second one: the Collection Filter.

What’s the Collection Filter?

A Filter is not about a whole product, but about the part of its functionality that plays a certain role.

The Collection Filter is about how data makes its way into Datadog.

It covers the features that gather, refine, and shape signals before they’re stored and shown.

I like to think of data collection in three phases:

Ingestion → bringing signals in.
Enrichment → shaping context so the data makes sense (adding what’s missing, removing what doesn’t belong).
Reduction → trimming or transforming data to keep things practical.

Phase 1: Ingestion

Responsibility: get signals from your systems into Datadog.

Examples: the Datadog Agent on hosts and containers, tracing libraries inside applications, the Lambda Forwarder, RUM SDKs, AWS integration, synthetic bots, private locations, Workflow Automation (when it’s used to gather data), or even custom API calls.

Why it matters: without ingestion, there’s nothing to observe. These components sit close to your systems and act as the first touchpoint for your telemetry.

Phase 2: Enrichment

Responsibility: adjust telemetry so it carries the right context — by adding what’s missing and removing what doesn’t belong.

Examples: log pipelines, event pipelines, and Sensitive Data Scanner, which strips out credentials or personal information so only meaningful context remains.

Why it matters: telemetry without context is just noise. Enrichment makes the data actionable by adding what was missing, and tools like Sensitive Data Scanner make it safe and clean by removing details that aren’t needed. Together, they ensure the data is complete, relevant, and trustworthy.

Phase 3: Reduction

Responsibility: manage volume and cost by filtering, sampling, or transforming data.

Examples: log indexes, generate metrics, APM retention filters, RUM retention filters, tag configurations for metrics.

Why it matters: collecting everything forever is ideal in theory, but rarely possible in practice. Reduction features give you levers to control scale and budget while still keeping the insights you need.

Trade-off: reduction always means some loss of fidelity. The art is to reduce smartly, without cutting away the signals you’ll need later.

Why This Model Helps

Looked at by product name, features like pipelines, indexes, or SDKs can feel scattered.

But through the Collection Filter, they form a simple flow: ingest → enrich → reduce.

This also highlights priorities:

Ingestion and enrichment are essential. They make observability possible.
Reduction is optional. It’s the set of knobs you turn to keep data useful and affordable.

So instead of asking, “Which product does what?”, you can ask:

What data do I need to know my system is healthy, and to find the root cause when it isn’t?

Once that’s clear, you can decide which ingestion, enrichment, and reduction features to use.

Closing Thoughts

This is the second post in the Filters series.

We’ve now covered Presentation and Collection — how data is shown, and how data gets in.

Next up is the Execution Filter, which focuses on how Datadog takes action on your systems.

That one will be shorter, but it completes the picture.

I’d love to hear your thoughts: does the Collection Filter help make Datadog easier to navigate?