Just completed my DEZ Zoomcamp 2026 capstone project — the Tech Ecosystem Observatory
Built a full cloud-native batch data pipeline from scratch that answers: which industries are shedding the most jobs, and how does that correlate with YC startup activity?
Here's what the pipeline looks like end to end:
✅ Terraform — provisioned GCS bucket and BigQuery datasets as infrastructure as code
✅ Docker — containerized all ingestion scripts into a portable image
✅ Kestra — orchestrated a 4-task batch DAG running weekly every Monday 6AM UTC
✅ Google Cloud Storage — raw JSONL data lake storing layoffs and YC company data
✅ BigQuery — partitioned by date (monthly) and clustered by industry/country for optimized queries
✅ dbt Cloud — built staging views and mart tables (mart_monthly_layoffs + mart_tech_ecosystem)
✅ Looker Studio — 2-page interactive dashboard with layoff trends, geo maps, ecosystem stress ratios
📊 Data: 4,317 layoff events (2023–2024) + 5,690 YC-backed companies
🔗 Live dashboard: https://lookerstudio.google.com/reporting/b1620cae-97cb-4911-82b8-dd0c46ee8acb
💻 GitHub: https://github.com/Derrick-Ryan-Giggs/tech-ecosystem-observatory
Huge thanks to @DataTalksClub and Alexey Grigorev for building and maintaining this incredible free course. If you're serious about data engineering, this is where you start 👇
https://github.com/DataTalksClub/data-engineering-zoomcamp/
Who else is building data pipelines? Drop your projects below 👇
Top comments (0)