In my last post, we broke down the core concepts of Docker and why packaging your code into standardized containers is ideal for avoiding "dependency hell."
In this article, we look into developing an ETL project and running it on Docker using docker-compose. The project uses the Coinpaprika API to fetch and normalize ticker data.
We will also look at how to push the entire project to GitHub using atomic commits to keep our version control clean. Letβs dive into how it works!
Why Docker Compose?
Instead of managing containers individually, Docker Compose allows us to define a multi-container application in a single file β docker-compose.yml.
Our project requires two distinct services/containers:
-
db(PostgreSQL): The data warehouse destination where our normalized coin data will be loaded. -
etl_script(Python): Our custom application container that sends requests to the Coinpaprika API, transforms the JSON response usingpandas, and pushes it to our database.
Step 1: Writing the Configuration Files
To build this, we create a clean directory structure. First, write a Dockerfile for the Python ETL script to ensure it has all the necessary packages installed:
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY etl_script.py .
CMD ["python", "etl_script.py"]
Step 2: Keeping Secrets Secret with a .env File
Introduce a .env file - a simple text file that holds sensitive environment variables.
In this case, the file will hold our database configurations.
Note that this file is not pushed to GitHub.
Here is what the .env file looks like:
# .env
DB_USER=crypto_admin
DB_PASSWORD=SuperSecretPassword123
DB_NAME=crypto_warehouse
Step 3: Composing the Multi-Container Network
Next, write the docker-compose.yml file to define the database and the script together.
Notice how it references the variables dynamically from our .env file using the ${VARIABLE_NAME} syntax. Docker Compose automatically detects this file in the same directory and injects the values safely:
version: '3.8'
services:
db:
image: postgres:15
container_name: crypto_postgres
restart: always
environment:
POSTGRES_USER: ${DB_USER}
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_DB: ${DB_NAME}
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
etl_script:
build: .
container_name: crypto_etl_runner
depends_on:
- db
environment:
- DB_HOST=db
- DB_NAME=${DB_NAME}
- DB_USER=${DB_USER}
- DB_PASSWORD=${DB_PASSWORD}
volumes:
postgres_data:
Networking in Docker:
Under etl_script, the DB_HOST environment variable is set to db instead of localhost. Because Docker Compose spins up both containers on a shared default network, they can find each other using their service names as hostnames!
Step 4: Spinning it Up
With our files defined, launching the entire multi-container architecture requires just one command in the terminal:
docker compose up --build
Docker automatically pulls the Postgres image, builds our custom Python image, injects our hidden .env configurations, sets up the network isolation, and spins up both containers simultaneously.
Step 5: Practicing Clean Version Control (Atomic Commits)
Instead of working for three hours and writing a giant, generic commit message like "fixed code and added files", let's practice how to write atomic commits.
An atomic commit means each commit does exactly one logical thing. It makes your GitHub commit history readable, and if something breaks, itβs very easy to roll back to the exact step where things went wrong.
Here is an example of our commit timeline for this project:
idx: Starting point, Initializing project scaffoldinginfra: Added Docker Infrastructure: Dockerfile, docker-compose.yml and env varsfeature: Add ETL script with CoinPaprika API integration and PostgreSQL connection handler
Key Takeaways
-
Volumes prevent data loss: Adding the
volumestag to the Postgres container ensures that even if we stop and destroy our containers usingdocker-compose down, the actual crypto data stays saved safely on the hard drive. -
Environments are modular: Now, our configurations are completely decoupled from the code. If someone clones the repository from GitHub, the project won't leak any secrets, and they can easily plug in their own database credentials by creating their own local
.envfile.
All the best as you continue learning Docker!

Top comments (0)