Tech Ecosystem Observatory: How I Built a Cloud-Native Data Pipeline to Track Global Tech Layoffs vs YC Startup Activity

#dataengineering #gcp #bigquery #dbt

Just completed my DEZ Zoomcamp 2026 capstone project — the Tech Ecosystem Observatory

Built a full cloud-native batch data pipeline from scratch that answers: which industries are shedding the most jobs, and how does that correlate with YC startup activity?

Here's what the pipeline looks like end to end:

✅ Terraform — provisioned GCS bucket and BigQuery datasets as infrastructure as code

✅ Docker — containerized all ingestion scripts into a portable image

✅ Kestra — orchestrated a 4-task batch DAG running weekly every Monday 6AM UTC

✅ Google Cloud Storage — raw JSONL data lake storing layoffs and YC company data

✅ BigQuery — partitioned by date (monthly) and clustered by industry/country for optimized queries

✅ dbt Cloud — built staging views and mart tables (mart_monthly_layoffs + mart_tech_ecosystem)

✅ Looker Studio — 2-page interactive dashboard with layoff trends, geo maps, ecosystem stress ratios

📊 Data: 4,317 layoff events (2023–2024) + 5,690 YC-backed companies

🔗 Live dashboard: https://lookerstudio.google.com/reporting/b1620cae-97cb-4911-82b8-dd0c46ee8acb

💻 GitHub: https://github.com/Derrick-Ryan-Giggs/tech-ecosystem-observatory

Huge thanks to @DataTalksClub and Alexey Grigorev for building and maintaining this incredible free course. If you're serious about data engineering, this is where you start 👇

https://github.com/DataTalksClub/data-engineering-zoomcamp/

Who else is building data pipelines? Drop your projects below 👇