DEV Community

Cover image for Notes on Data Engineering Zoomcamp 2025 - Launch Stream
Pizofreude
Pizofreude

Posted on

2

Notes on Data Engineering Zoomcamp 2025 - Launch Stream

Overview:

  • Course Edition: Fourth edition of the Data Engineering Zoomcamp.
  • Purpose of Stream: Introduction to the course, syllabus, logistics, team members, and Q&A session.
  • Key Topics Covered:
    • Course structure and syllabus.
    • Introduction to team members.
    • Tools and platforms for learning and communication.
    • Logistics and homework submissions.
    • Importance of community learning.

Course Team:

  1. Victoria:
    • Works at DT Hub and was part of the first Data Engineering Zoomcamp cohort.
    • Covers Analytics Engineering in the course.
  2. Alexey:
    • Founder of DataTalks Club.
    • Previously a data scientist with significant experience in data engineering tools.
    • Covers Docker and Spark modules.
  3. Michael:
    • Senior Data Analyst and teaching assistant for the past two years.
    • Content creator and runs a YouTube channel called "Data Slinger."
    • Assists with content and troubleshooting.
  4. Bruno:
    • Senior Data Engineer at Intuition Machines.
    • Extensive experience in data engineering.
    • Provides guidance and support to participants.
  5. Will and Anna:
    • Work at Castra.
    • Cover Workflow Orchestration (Module 2).
  6. Zach:
    • Staff Data Engineer and instructor in another data engineering bootcamp.
    • Focuses on advanced topics like Flink.
    • Founder of dataexpert.io
  7. Anush and Seal:
    • Part of the original team that launched the Zoomcamp initiative.
    • Anush remains active in supporting the community.

Course Syllabus:

  1. Module 1: Introduction to Docker and Google Cloud setup.
    • Google Cloud offers $300 in free credits for first-time users.
    • AFAIK, GCP offers two type of free trial for its cloud services:
      • Free Trial with $300 credit which requires billing details (all features available)
      • Sandbox option without requirement of billing details (limited features)
    • Since the Sandbox options does allows user to use the services required by this course, I will start with Sandbox option and will consider the Free Trial should the limited features of Sandbox doesn’t comply with the course full requirements.
  2. Module 2: Workflow Orchestration.
    • Covers orchestration tools such as Prefect and Airflow.
  3. Module 3: Data Warehousing.
    • Emphasis on BigQuery and PostgreSQL.
  4. Module 4: Analytics Engineering.
    • Introduction to DBT (Data Build Tool) for SQL transformations.
  5. Module 5: Spark.
    • Focus on distributed data processing.
  6. Module 6: Stream Processing.
    • Includes tools like Kafka and Flink (TBA by Zach).
  7. Workshop: Data Ingestion with DLT (Delta Live Tables).

Logistics:

  • Content Delivery:
  • Homework:
  • Community Support:
    • Slack is the primary communication platform.
    • Participants encouraged to use threads for organized discussions.
    • Learning in public (e.g., posting progress on LinkedIn) is recommended.

Learning in Public:

  • Benefits:
    • Helps participants build their personal brand.
    • Encourages networking and community engagement.
    • Demonstrates growth and dedication.
  • Examples:
    • Sharing project updates or lessons learned on LinkedIn.

Tools and Recommendations:

  • Google Cloud Platform (GCP):
    • Recommended for the course due to its ease of use and free credits.
    • AWS and Azure are also options, but GCP is more straightforward.
  • Additional Tools:
    • Participants are encouraged to explore other platforms and tools beyond the syllabus, such as data governance and scripting with Makefiles and Bash.

Q&A Highlights:

  1. Career Preparation:
    • Prepares participants for roles in data engineering and analytics engineering.
    • Emphasizes project-based learning.
  2. AI and Data Engineering:
    • AI is unlikely to replace data engineers but may enhance productivity.
    • LLM Zoomcamp is pretty much AI for Data Engineering, highly recommended after completing DE Zoomcamp.
  3. Key Advice for Success:
    • Consistency in learning and building projects.
    • Active participation in the community.
    • Sharing work publicly to stand out.
  4. Beginner-Friendly:
    • Suitable for those new to data engineering, even without prior software engineering experience. Chicken and egg problem 😉

Final Notes:

  • Contributions:
    • Participants encouraged to contribute to open-source projects via the "Open Source Spotlight" on the YouTube channel.
  • Focus on Projects:
    • Priority given to delivering projects rather than homework for certification.
  • Future Learning Opportunities:
    • Check out related courses like the LLM Zoomcamp for AI-focused topics.

Motivational Message:

  • Stay consistent, actively participate, and leverage the community for support.

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay