Sunny ✨

Posted on Jan 27

Diving into Data Engineering: My First Week at DataTalksClub's ZoomCamp 2025

#dataengineering #docker #terraform #googlecloud

This year, one of my key goals is to level up my professional skills, especially in areas that align with my career growth. Last week, I took a big step toward that goal by joining the DataTalk's Data Engineering ZoomCamp 2025 cohort, a free nine-week program that dives deep into the essentials and practical applications of data engineering.

Just a week in, and I've already learned so much! Here's a quick rundown of what we've covered so far:

1. Docker Basics: Containerizing Applications

We started with Docker, a powerful tool for creating, deploying, and running applications in containers. Containers are lightweight, isolated environments that package an application and its dependencies, making it easy to run consistently across different environments.

Key Concepts:

Images: Read-only templates used to create containers
Containers: Running instances of Docker images
Dockerfile: A script that defines how to build a Docker image
Volumes: Persistent storage for containers, ensuring data isn't lost when a container is deleted

Hands-On: We containerized a PostgreSQL database and ran it using Docker. This allowed us to set up a fully functional database environment in minutes!

2. Docker Compose: Managing Multi-Container Applications

Next, we explored Docker Compose, a tool for defining and running multi-container Docker applications. Using a docker-compose.yaml file, we configured services, networks, and volumes for our application.

Example Setup:

A PostgreSQL database container
A pgAdmin container for database management
Both containers connected via a custom Docker network

Commands:

docker-compose up -d: Start services in detached mode
docker-compose down: Stop and remove services

3. Terraform: Infrastructure as Code (IaC)

On the cloud side, we dove into Terraform, an open-source Infrastructure-as-Code (IaC) tool. Terraform allows you to define and provision infrastructure using a declarative configuration language.

Key Concepts:

State Management: Terraform tracks the state of your infrastructure in a .tfstate file
Providers: Plugins that interact with cloud APIs (e.g., GCP, AWS)
Resources: Components of your infrastructure (e.g., VMs, databases)

Hands-On: We used Terraform to automate the setup of cloud resources on Google Cloud Platform (GCP), including storage buckets and virtual machines.

4. Real-World Application: New York TLC Datasets

To tie everything together, we worked with the New York TLC datasets, a real-world dataset used for taxi and ride-sharing analysis. We applied the concepts we learned—Docker, PostgreSQL, and Terraform—to ingest, store, and analyze the data.

5. What's Next?

This is just the beginning! Over the next eight weeks, we'll dive deeper into data pipelines, workflow orchestration, and more. I'm excited to continue this journey and share my learnings along the way.

A big thank you to @alexeygrigorev and the entire DataTalksClub team for their guidance and support. This journey has been both challenging and rewarding, and I can't wait to see where it takes me next!

What about you? Are you working on up-skilling in data engineering or cloud technologies? Let me know in the comments!

DataEngineering #Docker #Terraform #GCP #PostgreSQL #DEZoomcamp

DEV Community

Diving into Data Engineering: My First Week at DataTalksClub's ZoomCamp 2025

1. Docker Basics: Containerizing Applications

Key Concepts:

2. Docker Compose: Managing Multi-Container Applications

Example Setup:

Commands:

3. Terraform: Infrastructure as Code (IaC)

Key Concepts:

4. Real-World Application: New York TLC Datasets

5. What's Next?

DataEngineering #Docker #Terraform #GCP #PostgreSQL #DEZoomcamp

Top comments (0)