Mohamed Hussain S

Posted on Apr 7 • Edited on Apr 16

Why My Metrics Pipeline with Telegraf Didn’t Work (and What I Learned)

#dataengineering #clickhouse #devops #observability

Part 1 of a series on building a metrics pipeline into ClickHouse

Collecting metrics is easy.

Shipping them to an analytical database without losing your mind is the hard part.

The Goal

At one point, the task seemed straightforward:

Collect system metrics (CPU, memory, GPU) and store them in ClickHouse for analysis.

This is a common observability use case.
You collect metrics, send them somewhere, and run queries on top.

Simple enough.

But in practice, it didn’t go as planned.

The Initial Approach: Telegraf

I started with Telegraf.

It’s widely used for collecting system metrics and has a plugin-based architecture, which makes it a natural first choice.

This was also where I first came across TOML.

At first, it felt like I just needed to “write a config and run it.”
But very quickly, I realized:

Configuration isn’t just syntax-it defines how your system behaves.

What I Was Trying to Build

The idea was simple:

Collect host-level metrics (CPU, memory, etc.)
Collect GPU metrics
Push everything into ClickHouse
Run analytical queries on top

Essentially, a basic observability pipeline.

Where Things Started Breaking

On paper, Telegraf looked like it should work.

In reality, I ran into a few issues:

No straightforward way to push data into ClickHouse
Lack of a native ClickHouse output plugin
Debugging wasn’t very intuitive
Configurations became rigid as complexity increased

At some point, I was spending more time trying to make the tool fit the use case than actually solving the problem.

A Shift in Perspective

This is where something important clicked.

Up until this point, I was thinking in terms of:

Write config → Run tool → Expect output

But that approach wasn’t working.

What I needed instead was a clearer understanding of how data actually flows:

Data source → Transformation → Destination

The problem wasn’t just the tool-it was the lack of control over how data moved through the system.

Why I Decided to Move Away

At this stage, it became clear that I needed:

More control over data transformations
Better visibility into how data flows
A system that is easier to debug

Telegraf, while powerful, didn’t give me that level of flexibility for this use case.

What’s Next

That’s when I decided to try a different approach using Vector.

Instead of treating configuration as static setup, Vector treats it as a pipeline.

In the next part, I’ll walk through:

How Vector pipelines work
Why the sources → transforms → sinks model made a difference
And what changed when I adopted that approach

Series Overview

This post is part of a series:

Part 1: Why the Telegraf approach didn’t work (this post)
Part 2: Understanding Vector pipelines
Part 3: Writing transforms and handling data
Part 4: Debugging and making the pipeline reliable

Final Thought

What started as a simple setup turned into a deeper lesson:

Tools don’t solve problems-understanding systems does.

Once that became clear, the direction forward was much easier.

DEV Community