DEV Community

Cover image for Youtube API Project
Bob Otieno Okech
Bob Otieno Okech

Posted on

Youtube API Project

📊 YouTube API – Data Warehouse & Analytics Solution

This repository demonstrates a complete data pipeline that extracts data from the YouTube Data API, models it using the Medallion Architecture, and delivers business-ready insights via Grafana dashboards.


📦 Project Summary

This project implements a modern analytics pipeline with:

  • Medallion Architecture: Structured into Bronze, Silver, and Gold layers for scalable data processing.
  • ETL Workflows: Automated extraction, transformation, and loading using Apache Airflow.
  • Data Modeling: Dimensional modeling in PostgreSQL for optimized querying.
  • Dashboards: Real-time reporting using Grafana, powered by SQL.

🧰 Tech Stack

  • PostgreSQL – Central data warehouse
  • Apache Airflow – Workflow orchestration
  • Grafana – Real-time data visualization
  • Linux VM – Compute environment for pipeline execution
  • Python – API ingestion & transformation logic

🎯 Project Objectives

Build a production-ready analytics solution to analyze YouTube channel and video performance:

  • Source structured data from the YouTube Data API
  • Clean, validate, and model for business intelligence
  • Persist historical metrics (views, likes, etc.) for trend analysis
  • Deliver actionable insights via dashboards and SQL queries

🗃️ Data Architecture (Medallion Model)

This project follows a Bronze → Silver → Gold pipeline:

Architecture

🔹 Bronze Layer

Raw ingestion from the YouTube API (JSON format)

🔸 Silver Layer

Cleaned, validated, and structured data (see data flow and model below)

  • Data Flow
    DataFlow

  • Data Model
    Data Model

🟡 Gold Layer

Aggregated data used to generate KPIs and dashboards in Grafana

  • Visualization Sample Visualization

📈 BI Use Cases

Dashboards and SQL queries answer key questions such as:

  • What are the top-performing videos per channel?
  • How is each channel performing over time?
  • What are the daily trends for views and engagement?

📁 Repository Structure

├── README.md
├── channel_lists.py
├── channel_overview.py
├── channel_videos.py
├── __pycache__/                        # Compiled Python files
├── project_files/
│   ├── Architecture/                  # Draw.io and PNG files for architecture
│   └── ddl_update_scripts/           # SQL DDLs and procedures
│       ├── dim_channels.sql
│       ├── dim_videos.sql
│       ├── fct_subscribers_views_video_count.sql
│       └── fct_video_statistics.sql
└── requirements.txt                   # Python dependencies
Enter fullscreen mode Exit fullscreen mode

🔗 Access the Code

Browse the full codebase here


Top comments (1)

Collapse
 
aswet_natex_293e25ade4c28 profile image
Aswet Natex

The YouTube API Project is a comprehensive data warehouse and analytics solution designed to extract, process, and visualize YouTube Data API metrics using a Medallion Architecture comprising Bronze (raw data), Silver (cleaned data), and Gold (aggregated insights) layers. It leverages technologies like PostgreSQL, Apache Airflow, and Grafana to automate ETL workflows, support dimensional modeling, and create real-time dashboards for analyzing video and channel performance over time. With Python handling API ingestion and a Linux VM for execution, the project delivers actionable business intelligence through historical metric tracking and trend analysis. Check youtubegold.net for details on similar YouTube-focused tools and services.