This month, I began the second phase of my data engineering journey with the Datatalks Data-Engineering Zoomcamp. I learned about the program earlier from a seasoned Data Engineering professional, but initially struggled to navigate it as I was distracted and updating my skill set. When the 2026 cohort started, I was ready to take advantage of learning in an open-source, collaborative setting.
The first week focuses on setting up our tooling and familiarizing ourselves with Docker, GCP, and Terraform, as well as refreshing our knowledge of Python and SQL, to gradually build the pieces that will become a data pipeline. Here are some of the things I’ve learnt:
Containerization and Docker:
Containerization is a software deployment method that packages an application along with its dependencies, libraries, and other necessary components in a lightweight and efficient single container. This ensures the application can run smoothly, independent of where it is deployed. This is the principle behind Docker. Docker is an open-source platform that enables developers to build, deploy, run, update and manage containerized applications.
I learned the process of assemblying a docker image using a Dockerfile. A dockerfile is a top-down text document that contains instructions a user would use to configure an image. The instruction is not case-sensitive. However, convention dictates that they should be in uppercase to distinguish them from arguments more easily. Dockerfiles can start simple and grow to support more complex scenarios as your needs evolve.
In Dockerfile example above,
- The FROM instruction sets your base image to a lightweight release of Python 3.13.11
- The COPY instruction copies files from an image, a build stage, or a named context. In this instance, the files in the
/uvdirectory of the Docker imageghrc.io/astral-sh/uv:latestare copied into/bin/inside the image I’m currently building. - The WORKDIR instruction sets the working directory for the next sets of instruction. Here, I am setting it to
/code. If the WORKDIR doesn’t exist, it will be created. - The ENV instruction sets the environment variable
PATHto/code/.venv/bin:$PATH. The environment variables set usingENVwill persist when a container is run from the resulting image. - The next
COPYinstruction copies the filepyproject.toml,.python-version,uv.lockfrom my current working local directory to the WORKDIR (/code) - The
RUN uv sync —lockedinstruction usesuvto install dependencies exactly as pinned in theuv.lockfile, and fails if the lockfile is out of sync - The next COPY instruction copies the
ingest_data.pyfile from my local machine to the Docker image build - And finally, the
ENTRYPOINT [”python”, “ingest_data.py”]is used to execute the ingest_data file as soon as the image build is complete. An ENTRYPOINT allows you to configure a container that will run as an executable.
A Docker image is subsequently used to build a Docker container.
I also explored setting up and running an interactive multi-container Docker application. This included learning about Docker persistence using docker volume mapping to ensure data is retained even when containers are stopped or removed.
Additionally, I have itemized the other things I learned and hope to write about them as days pass by.
Dockerfile vs Docker-Compose
Terraform
UV
GitHub Codespaces

Top comments (0)