DEV Community

Cover image for From Podcasts to Pipelines: Building a YouTube Analytics Engine Inspired by Mic Cheque
Rotich Kelly
Rotich Kelly

Posted on • Edited on

From Podcasts to Pipelines: Building a YouTube Analytics Engine Inspired by Mic Cheque

"Now when you watch it you'll understand!"

Ever been so hooked on a podcast that you ended up building a full-blown data pipeline because of it? No? Just me? Cool. Let me tell you the story anyway.

It all started with the Mic Cheque Podcast—a brilliant blend of humor, deep takes, and real talk that kept popping up on my YouTube feed. As a data engineering enthusiast and a fan of the pod, I had one question buzzing in my head:

What makes some podcast episodes go viral while others stay in the shadows?

The Idea
What if I could track, analyze, and visualize the performance of the podcast episodes using actual YouTube data?
Boom—project idea locked. I decided to build a fully automated YouTube Data Pipeline with the goal of creating a live analytics dashboard to answer burning questions like:

Which episodes are going viral?

What days do high-performing episodes drop?

Is there a pattern between guest appearances and views?

The Stack
To make it real, I pulled out the big guns:

Python + Airflow: For automating the entire pipeline from extract to load

YouTube API: For fetching episode metadata and stats

PostgreSQL (Aiven): As the data warehouse

Apache Spark: For heavy lifting (a.k.a. transforming the raw data)

Grafana: For visualizing performance trends that even a podcast guest would appreciate

The Flow
Here’s what the pipeline does:

Extract data from YouTube using the YouTube Data API

Transform it using Spark (cleaning, enriching with time-based insights, classifying performance)

Load it into a PostgreSQL instance hosted on Aiven

Visualize the trends using Grafana—complete with charts showing view counts, likes, comments, publishing patterns, and a "performance class" metric

And the best part? All this runs automatically thanks to Airflow.

The Result
After plugging it into Grafana, the dashboard popped! Pie charts for performance class, time series of views by month, and even weekday publishing trends. It's like giving a brain to your favorite podcast channel.

What I Learned
Airflow can be your best friend or your worst enemy (don’t fight it—configure it properly!)

PostgreSQL on Aiven is smooth, but Grafana’s port configs can mess you up if you're not careful

Data pipelines are lit when they bring your passions and skills together

The Wrap-Up
This started as a fun side project but turned into a powerful learning experience. Whether you're a Mic Cheque stan, a data nerd, or both—I'd highly recommend building a pipeline around something you genuinely love.

Your data has a story. You just need to build the mic for it.

Want to see the code? Check out the GitHub Repo
https://github.com/KellyKiprop/Youtube-Data-Pipeline

Image description

Image description

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.