Part 2 of a series on building a metrics pipeline into ClickHouse
Read Part 1: Why my metrics pipeline with Telegraf didn’t work
Picking Up Where Things Broke
In the previous part, I talked about trying to build a metrics pipeline using Telegraf - and why that approach didn’t work for my use case.
The biggest issue wasn’t just tooling.
It was this:
I didn’t have enough control over how data moved through the system.
That’s what led me to explore a different approach.
Why Vector
I came across Vector while looking for something more flexible.
At a glance, it felt different.
Instead of thinking in terms of plugins and configs, Vector is built around a pipeline model.
And that changes everything.
The Core Idea: Pipelines
At the center of Vector is a simple concept:
Sources → Transforms → Sinks
That’s it.
But this model makes the flow of data explicit.
- Sources → where data comes from
- Transforms → how data is modified
- Sinks → where data is sent
Compared to my earlier approach, this immediately felt clearer.
What This Actually Means
Instead of writing a config and hoping everything connects correctly, you define:
- What data you are collecting
- How that data should be shaped
- Where that data should go
That shift sounds small - but it changes how you think about the system.
From Config Files to Data Flow
With Telegraf, my thinking looked like this:
Write config → Run → Debug errors
With Vector, it started becoming:
Collect → Transform → Route → Store
The focus moved from:
- “What config do I write?”
to:
- “How does data move through each stage?”
The New Learning Curve
Of course, switching tools didn’t magically solve everything.
There were new challenges.
Vector uses YAML for configuration, which was different from the TOML I was used to.
And more importantly:
The pipeline only works if every stage is defined correctly.
Some of the early issues I ran into:
- Incorrect source definitions
- Misconfigured sinks
- Data not flowing as expected
- Silent failures when something didn’t connect properly
At times, it felt like nothing was happening-even though everything looked “correct.”
First Realization: Everything Is Connected
One important thing I learned quickly:
If one stage breaks, the entire pipeline breaks.
Unlike simpler setups, you can’t treat components independently.
- A bad transform can stop data entirely
- A misconfigured sink can drop everything silently
- A source that doesn’t emit correctly makes debugging harder
This forced me to start thinking in terms of end-to-end flow, not individual pieces.
What Improved Immediately
Despite the challenges, a few things became better compared to before:
- Clear visibility into how data moves
- Better control over transformations
- More flexibility in shaping data before sending it to ClickHouse
Even though things weren’t fully working yet, I finally felt like I was closer to solving the actual problem.
What Was Still Missing
At this stage, the pipeline structure made sense.
But one part was still unclear-and turned out to be the hardest:
How to correctly transform the data so that ClickHouse would accept it.
This is where most of the complexity showed up.
What’s Next
In the next part, I’ll dive into the most challenging part of this setup:
- Writing transforms using Vector Remap Language (VRL)
- Handling strict data types
- Fixing timestamp issues
- And shaping metrics into a format that ClickHouse can actually ingest
Series Overview
This post is part of a series:
- Part 1: Telegraf struggles and initial setup
- Part 2: Moving to Vector and understanding pipelines (this post)
- Part 3: Writing transforms and handling data correctly
- More parts in this series will be published soon
Final Thought
Switching tools didn’t solve the problem immediately.
But it did something more important:
It made the system visible.
Once I could see how data moved through each stage, debugging stopped being guesswork-and started becoming structured.
Top comments (0)