1. Introduction
- This video continues the Data Engineering Zoomcamp series, focusing on Docker and SQL.
- Previous lessons covered packaging an ingestion script into a Docker container.
- This lesson builds on that by running PostgreSQL and pgAdmin using Docker Compose instead of separate Docker commands.
2. Challenges with Running PostgreSQL and pgAdmin Separately
- Previously, PostgreSQL and pgAdmin were run in one network using two Docker commands.
- Manually configuring each container is cumbersome and requires multiple command-line arguments.
- A more convenient approach is to define configurations in a single YAML file using Docker Compose.
3. What is Docker Compose?
- Docker Compose is a utility that allows defining multi-container configurations in a YAML file.
- It simplifies the process by allowing multiple services to be launched with a single command.
- Containers in the
docker-compose.yamlfile automatically belong to the same network.
4. Installing Docker Compose
- Mac/Windows: Comes pre-installed with Docker Desktop.
- Linux: Needs to be downloaded separately and placed in the system path.
5. Creating the docker-compose.yaml File
- Defines two services: PostgreSQL (
pg_database) and pgAdmin (pg_admin). - Optional version specification; if omitted, the oldest version is used by default.
- Services include environment variables, volume mapping, and port configurations.
5.1 PostgreSQL Service (pg_database)
- Specifies the PostgreSQL image (
postgres:13). -
Defines necessary environment variables:
environment: POSTGRES_USER: root POSTGRES_PASSWORD: root POSTGRES_DB: ny_taxi -
Volume mapping is used to persist data:
In docker-compose we don’t need to specify the full path
volumes: - ./ny_taxi:/var/lib/postgresql/data -
Port mapping:
ports: - "5432:5432"
5.2 pgAdmin Service (pg_admin)
- Uses the pgAdmin image.
-
Defines environment variables:
environment: PGADMIN_DEFAULT_EMAIL: admin@admin.com PGADMIN_DEFAULT_PASSWORD: root -
Maps port 8080 on the host to port 80 inside the container:
ports: - "8080:80" Volume mapping is not configured initially.
Since we defined both services pgdatabase and pgadmin in Docker-compose, we don’t have to create docker network manually since it will be auto-completed.
6. Running Services with Docker Compose
6.1 Starting Services
-
First, stop any running PostgreSQL and pgAdmin containers:
docker stop <container_id> -
Check for running containers:
docker ps -
Start services with Docker Compose:
docker-compose up This creates and runs PostgreSQL and pgAdmin, making them accessible.
6.2 Accessing Services
- Open a web browser and go to
http://localhost:8080to access pgAdmin. - Use the defined credentials (
admin@admin.com,root) to log in. - Add a new connection:
- Hostname:
pg_database - Username:
root - Password:
root
- Hostname:
- Verify that the database is accessible and contains the
ny_taxidataset.
6.3 Stopping Services
- Use
Ctrl + Cto stop services. -
Alternatively, shut down services properly using:
docker-compose down
6.4 Running in Detached Mode
-
Run in detached mode (
d) to keep services running in the background:
docker-compose up -d This allows using the terminal without keeping it occupied.
-
Shut down services when needed using:
docker-compose down
7. Benefits of Using Docker Compose
- Eliminates long and complex
docker runcommands. - Ensures services are automatically networked.
- Simplifies local development and integration testing.
- Makes managing multi-container applications easier.
8. Next Steps
- Now that PostgreSQL is set up with data ingestion, the next step is to write SQL queries.
- Future lessons will cover SQL queries and data manipulation within PostgreSQL.
Summary:
- Docker Compose simplifies running multiple containers with a single YAML configuration file.
- PostgreSQL and pgAdmin can be easily set up and managed using
docker-compose.yaml. - Using detached mode (
d) allows keeping services running in the background. - This setup is useful for local development and testing before deploying services in production.
Top comments (0)