Just completed Module 2 of the Data Engineering Zoomcamp 2026. Built production-ready data pipelines using Kestra to process 26 million NYC taxi trip records.
What I accomplished:
Orchestrated ETL workflows with Kestra
Ingested data from GitHub to GCS to BigQuery
Implemented partitioned tables for query optimization
Built MERGE operations for data deduplication
Automated monthly data loads with schedule triggers
Completed all homework questions (6/6)
Key learnings:
Template rendering with Pebble syntax
Handling trigger.date for manual vs scheduled runs
GCS storage class compatibility (REGIONAL vs STANDARD)
IANA timezone format for DST handling
BigQuery partitioning strategies
Cost efficiency:
Processed 2GB of data for less than one dollar, staying well within GCP's free tier.
Check out my project on GitHub: https://github.com/Derrick-Ryan-Giggs/module-2-workflow-orchestration
Thanks to DataTalksClub for this amazing free course.
Top comments (0)