📊 YouTube API – Data Warehouse & Analytics Solution
This repository demonstrates a complete data pipeline that extracts data from the YouTube Data API, models it using the Medallion Architecture, and delivers business-ready insights via Grafana dashboards.
📦 Project Summary
This project implements a modern analytics pipeline with:
- Medallion Architecture: Structured into Bronze, Silver, and Gold layers for scalable data processing.
- ETL Workflows: Automated extraction, transformation, and loading using Apache Airflow.
- Data Modeling: Dimensional modeling in PostgreSQL for optimized querying.
- Dashboards: Real-time reporting using Grafana, powered by SQL.
🧰 Tech Stack
- PostgreSQL – Central data warehouse
- Apache Airflow – Workflow orchestration
- Grafana – Real-time data visualization
- Linux VM – Compute environment for pipeline execution
- Python – API ingestion & transformation logic
🎯 Project Objectives
Build a production-ready analytics solution to analyze YouTube channel and video performance:
- Source structured data from the YouTube Data API
- Clean, validate, and model for business intelligence
- Persist historical metrics (views, likes, etc.) for trend analysis
- Deliver actionable insights via dashboards and SQL queries
🗃️ Data Architecture (Medallion Model)
This project follows a Bronze → Silver → Gold pipeline:
🔹 Bronze Layer
Raw ingestion from the YouTube API (JSON format)
🔸 Silver Layer
Cleaned, validated, and structured data (see data flow and model below)
🟡 Gold Layer
Aggregated data used to generate KPIs and dashboards in Grafana
-
Visualization Sample
📈 BI Use Cases
Dashboards and SQL queries answer key questions such as:
- What are the top-performing videos per channel?
- How is each channel performing over time?
- What are the daily trends for views and engagement?
📁 Repository Structure
├── README.md
├── channel_lists.py
├── channel_overview.py
├── channel_videos.py
├── __pycache__/ # Compiled Python files
├── project_files/
│ ├── Architecture/ # Draw.io and PNG files for architecture
│ └── ddl_update_scripts/ # SQL DDLs and procedures
│ ├── dim_channels.sql
│ ├── dim_videos.sql
│ ├── fct_subscribers_views_video_count.sql
│ └── fct_video_statistics.sql
└── requirements.txt # Python dependencies
🔗 Access the Code
Browse the full codebase here
Top comments (1)
The YouTube API Project is a comprehensive data warehouse and analytics solution designed to extract, process, and visualize YouTube Data API metrics using a Medallion Architecture comprising Bronze (raw data), Silver (cleaned data), and Gold (aggregated insights) layers. It leverages technologies like PostgreSQL, Apache Airflow, and Grafana to automate ETL workflows, support dimensional modeling, and create real-time dashboards for analyzing video and channel performance over time. With Python handling API ingestion and a Linux VM for execution, the project delivers actionable business intelligence through historical metric tracking and trend analysis. Check youtubegold.net for details on similar YouTube-focused tools and services.