Overview:
- Course Edition: Fourth edition of the Data Engineering Zoomcamp.
- Purpose of Stream: Introduction to the course, syllabus, logistics, team members, and Q&A session.
-
Key Topics Covered:
- Course structure and syllabus.
- Introduction to team members.
- Tools and platforms for learning and communication.
- Logistics and homework submissions.
- Importance of community learning.
Course Team:
-
Victoria:
- Works at DT Hub and was part of the first Data Engineering Zoomcamp cohort.
- Covers Analytics Engineering in the course.
-
Alexey:
- Founder of DataTalks Club.
- Previously a data scientist with significant experience in data engineering tools.
- Covers Docker and Spark modules.
-
Michael:
- Senior Data Analyst and teaching assistant for the past two years.
- Content creator and runs a YouTube channel called "Data Slinger."
- Assists with content and troubleshooting.
-
Bruno:
- Senior Data Engineer at Intuition Machines.
- Extensive experience in data engineering.
- Provides guidance and support to participants.
-
Will and Anna:
- Work at Castra.
- Cover Workflow Orchestration (Module 2).
-
Zach:
- Staff Data Engineer and instructor in another data engineering bootcamp.
- Focuses on advanced topics like Flink.
- Founder of dataexpert.io
-
Anush and Seal:
- Part of the original team that launched the Zoomcamp initiative.
- Anush remains active in supporting the community.
Course Syllabus:
-
Module 1: Introduction to Docker and Google Cloud setup.
- Google Cloud offers $300 in free credits for first-time users.
- AFAIK, GCP offers two type of free trial for its cloud services:
- Free Trial with $300 credit which requires billing details (all features available)
- Sandbox option without requirement of billing details (limited features)
- Since the Sandbox options does allows user to use the services required by this course, I will start with Sandbox option and will consider the Free Trial should the limited features of Sandbox doesnβt comply with the course full requirements.
-
Module 2: Workflow Orchestration.
- Covers orchestration tools such as Prefect and Airflow.
-
Module 3: Data Warehousing.
- Emphasis on BigQuery and PostgreSQL.
-
Module 4: Analytics Engineering.
- Introduction to DBT (Data Build Tool) for SQL transformations.
-
Module 5: Spark.
- Focus on distributed data processing.
-
Module 6: Stream Processing.
- Includes tools like Kafka and Flink (TBA by Zach).
- Workshop: Data Ingestion with DLT (Delta Live Tables).
Logistics:
-
Content Delivery:
- Most of the content is pre-recorded and available on the DataTalks Club GitHub repo and YouTube channel playlists 2025 cohort and general.
- Office hours and live sessions are available for interaction.
-
Homework:
- Provided as markdown files in the GitHub repo.
- Multiple-choice format with answers submitted via the course management platform.
-
Community Support:
- Slack is the primary communication platform.
- Participants encouraged to use threads for organized discussions.
- Learning in public (e.g., posting progress on LinkedIn) is recommended.
Learning in Public:
-
Benefits:
- Helps participants build their personal brand.
- Encourages networking and community engagement.
- Demonstrates growth and dedication.
-
Examples:
- Sharing project updates or lessons learned on LinkedIn.
Tools and Recommendations:
-
Google Cloud Platform (GCP):
- Recommended for the course due to its ease of use and free credits.
- AWS and Azure are also options, but GCP is more straightforward.
-
Additional Tools:
- Participants are encouraged to explore other platforms and tools beyond the syllabus, such as data governance and scripting with Makefiles and Bash.
Q&A Highlights:
-
Career Preparation:
- Prepares participants for roles in data engineering and analytics engineering.
- Emphasizes project-based learning.
-
AI and Data Engineering:
- AI is unlikely to replace data engineers but may enhance productivity.
- LLM Zoomcamp is pretty much AI for Data Engineering, highly recommended after completing DE Zoomcamp.
-
Key Advice for Success:
- Consistency in learning and building projects.
- Active participation in the community.
- Sharing work publicly to stand out.
-
Beginner-Friendly:
- Suitable for those new to data engineering, even without prior software engineering experience. Chicken and egg problem π
Final Notes:
-
Contributions:
- Participants encouraged to contribute to open-source projects via the "Open Source Spotlight" on the YouTube channel.
-
Focus on Projects:
- Priority given to delivering projects rather than homework for certification.
-
Future Learning Opportunities:
- Check out related courses like the LLM Zoomcamp for AI-focused topics.
Motivational Message:
- Stay consistent, actively participate, and leverage the community for support.
Top comments (0)