DEV Community

Abdulqudus Oladega
Abdulqudus Oladega

Posted on

Week 1 of DataZoomCamp

This month, I began the second phase of my data engineering journey with the Datatalks Data-Engineering Zoomcamp. I learned about the program earlier from a seasoned Data Engineering professional, but initially struggled to navigate it as I was distracted and updating my skill set. When the 2026 cohort started, I was ready to take advantage of learning in an open-source, collaborative setting.

The first week focuses on setting up our tooling and familiarizing ourselves with Docker, GCP, and Terraform, as well as refreshing our knowledge of Python and SQL, to gradually build the pieces that will become a data pipeline. Here are some of the things I’ve learnt:

Containerization and Docker:

Containerization is a software deployment method that packages an application along with its dependencies, libraries, and other necessary components in a lightweight and efficient single container. This ensures the application can run smoothly, independent of where it is deployed. This is the principle behind Docker. Docker is an open-source platform that enables developers to build, deploy, run, update and manage containerized applications.

I learned the process of assemblying a docker image using a Dockerfile. A dockerfile is a top-down text document that contains instructions a user would use to configure an image. The instruction is not case-sensitive. However, convention dictates that they should be in uppercase to distinguish them from arguments more easily. Dockerfiles can start simple and grow to support more complex scenarios as your needs evolve.

In Dockerfile example above,

  • The FROM instruction sets your base image to a lightweight release of Python 3.13.11
  • The COPY instruction copies files from an image, a build stage, or a named context. In this instance, the files in the /uv directory of the Docker image ghrc.io/astral-sh/uv:latest are copied into /bin/ inside the image I’m currently building.
  • The WORKDIR instruction sets the working directory for the next sets of instruction. Here, I am setting it to /code. If the WORKDIR doesn’t exist, it will be created.
  • The ENV instruction sets the environment variable PATH to /code/.venv/bin:$PATH. The environment variables set using ENV will persist when a container is run from the resulting image.
  • The next COPY instruction copies the file pyproject.toml, .python-version , uv.lock from my current working local directory to the WORKDIR (/code)
  • The RUN uv sync —locked instruction uses uv to install dependencies exactly as pinned in the uv.lock file, and fails if the lockfile is out of sync
  • The next COPY instruction copies the ingest_data.py file from my local machine to the Docker image build
  • And finally, the ENTRYPOINT [”python”, “ingest_data.py”] is used to execute the ingest_data file as soon as the image build is complete. An ENTRYPOINT allows you to configure a container that will run as an executable.

A Docker image is subsequently used to build a Docker container.

I also explored setting up and running an interactive multi-container Docker application. This included learning about Docker persistence using docker volume mapping to ensure data is retained even when containers are stopped or removed.

Additionally, I have itemized the other things I learned and hope to write about them as days pass by.

  • Dockerfile vs Docker-Compose

  • Terraform

  • UV

  • GitHub Codespaces

Top comments (0)