DEV Community

Cover image for Data Engineering Zoomcamp 2025 Cohort: Introduction - Self-Study Notes
Pizofreude
Pizofreude

Posted on

1 1 1 1 1

Data Engineering Zoomcamp 2025 Cohort: Introduction - Self-Study Notes

Overview

Course Duration and Structure

  • Duration: 6 modules plus 2 workshops.
  • Format: Weekly modules covering key data engineering topics.
  • Interactivity: Q&A sessions, Slack discussions, and GitHub contributions.

Key Resources

  • Slack: Main platform for discussions.
  • GitHub Repository: Contains all course materials.
  • Telegram Channel: For announcements and updates.
  • Environment Setup: Use GitHub Codespaces or cloud virtual machines for ease of starting up as compared to local installations with some hurdles especially when using Windows. Linux FTW! OpenSUSE Tumbleweed 😉

Key Topics and Modules

Week 1: Environment Setup and Basics

  • Key Tools: Docker, Terraform, GitHub Codespaces.
  • Focus: Preparing the environment for the course.
  • Skills Required: Basic command-line knowledge, Docker commands, Python basics.

Week 2: Workflow Orchestration

  • Tool: Kestra (orchestration tool).
  • Content:
    • Simplify scripts created in Week 1.
    • Convert CSV files to Parquet format and upload to Google Cloud Storage.
  • Notes:
    • The NYC Taxi and Limousine Commission dataset was used as an example.

Week 3: Data Warehousing

  • Tool: Google BigQuery.
  • Focus: Storing and querying large datasets.

Week 4: DBT (Data Build Tool)

  • Content:
    • Transform data for analysis.
    • Build visualizations and dashboards.

Week 5: Batch Processing

  • Tool: Apache Spark.
  • Focus:
    • Batch processing similar to DBT.
    • Provides finer control over data pipelines.

Week 6: Streaming and Real-Time Data Processing

  • Tools: Kafka, RisingWave (open-source SQL streaming tool).
  • Focus:
    • Stream processing using SQL.
    • Introduction to stream-based architectures.

Workshops

  1. Workflow Orchestration:
    • Practical session to consolidate Week 2 content.
  2. Streaming Data with SQL:
    • Hands-on workshop focusing on real-time data pipelines.

Final Project

  • Objective: Create a comprehensive data engineering project. This is the course requirement for graduation with certificate.
  • Guidelines:
    • Use any tools and concepts covered in the course.
    • Option to partner with nonprofits or work independently.
    • Focus on practical, real-world data use cases.
  • Submission:
    • Homework files available in the GitHub cohort repository.
    • Submit projects via a new automated platform (replacing Google Forms): Course Management Platform. Note: If Sign up using GitHub Auth failed (e.g. Server Error 500), user is adviced to use Slack Auth or GAuth instead. This is a known bug and still work-in-progress.

Expectations and Requirements

  • Prerequisites:
    • Familiarity with Python and basic programming concepts.
    • Command-line proficiency.
  • Time Commitment: Flexible; follow your own pace.
  • Certificates: Awarded upon successful completion of the final project. Homework submission counts toward internal ranking system as motivational instrument for participants.

Additional Tips

  • GitHub Contributions:
    • Star the course repository to help it trend.
    • Solve some ticket on Github issues as open-source contributions.
    • Engage with the community by sharing insights or asking questions.
  • Slack:
    • Check the FAQ document before posting queries.
    • Use relevant channels to interact with peers, or directly ask @ZoomcampQABot in the #course-data-engineering channel before reaching out to instructors as final resort.
  • Environment:
    • Codespaces offers a simple setup with pre-installed tools like Docker and Python.
    • Cloud virtual machines provide flexibility for advanced setups.

Career Insights and Recommendations

  • Job Outlook:
    • Despite tech layoffs, demand for data engineers remains strong.
    • Skills in platforms like GCP, AWS, and Azure are valuable.
  • Certifications:
    • Beneficial, especially for early-career professionals and consultants.
  • Applications:
    • Data engineering techniques are foundational for ML and analytics roles.

Why This Course is Free

  • Motivation: Sharing knowledge with the community.
  • Support: Funded by sponsors like Kestra, dlt, Mage, DTHub, and RisingWave.
  • Community Contribution:
    • Participants can support the course by sharing it, contributing feedback, or donating through training budgets towards DataTalksClub.

Miscellaneous Notes

  • Consider Learning in Public to stay motivate with extra points for ranking.
  • Office Hours: Scheduled for specific topics like Kestra and project guidance.
  • FAQ Document: Comprehensive guide available for common queries.
  • Past Student Contributions:
    • Many alumni have shared tools and insights to improve the course.
  • Data Architect Path:
    • Consider learning about Kimball methodologies.

By following this structured approach, we can maximize our learning experience in the Data Engineering Zoomcamp 2025 Cohort.

Good luck everyone!

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay