Reesav Gupta

Posted on Aug 18

Mastering Docker: A Beginner's Guide to Containerization

Why Containerisation?

Everyone has a different operating system
steps to run a project can vary based on the operating system
Extremely harder to keep track of dependencies as a project grows
What if there was a way to describe your project's configuration in a single file
What if that could be run in an isolated environment
Makes local setup of OS projects a breeze
Makes installing auxiliary services very simple

Definition

Containerization involves building self-sufficient software packages that perform consistently, regardless of the machines they run on.
It's taking the snapshot of a machine, the filesystem, and letting you use and deploy it as a construct.

note:- Allows for container orchestration which makes deployment a breeze.

Docker has 3 parts

CLI
- The CLI is where Docker commands are executed.
Engine
- Docker Engine is the heart of Docker and is responsible for running and managing containers. It includes:
  - Docker Daemon: Runs on the host, managing images, containers, networks, and storage.
Registry
- Docker Registry is a system for storing and sharing Docker images. It can be public or private, allowing users to upload and download images for easy collaboration and deployment.
  - Docker Hub: The default public registry with millions of images.
  - Private Registries: Custom, secure repositories organizations use to control their images.

Images v/s Containers

A docker image behaves like a template from which consistent containers can be created.

if docker was a traditional virtual machine, the image could be likened to the ISO used to install your VM. This isn't a robust comparison, as Docker differs from VMs in concept and implementation, but it's a useful starting point nonetheless.

Images define the initial filesystem state of new containers. They bundle your application's source code and its dependencies into a self-contained package runtime. Within the image, filesystem content is represented as multiple independent layers.

How to Containerize an App

Below is an example of a simple Dockerfile for a Node.js backend application:

# Use Node.js version 20 as the base image
FROM node:20

# Set up a working directory inside the container
WORKDIR /usr/src/app

# Copy the contents of the current directory to the working directory in the container
COPY . .

# Install dependencies specified in package.json
RUN npm install

# Expose the container's port 3000 to the host machine
EXPOSE 3000

# Define the command to run the application when the container starts
CMD ["node", "index.js"]

the first four lines of code i.e.,
FROM node:20 WORKDIR /usr/src/app COPY . . RUN npm install

these run while the image is being created but the line

CMD ["node", "index.js"]

executes only while the container starts. The expose 3000 is only exposing a port so we won't be considering that.

Build and Run the Docker Image

#Build the Docker image
docker build -t my-node-app .

#Run the Docker container
docker run -p 3000:3000 my-node-app

docker build -t my-node-app .

here the -t flag signifies the tag-name

docker run -p 3000:3000 my-node-app

all the requests coming to my machine at port 3000 should be routed to port 3000 of the container

Caching and Layers

 FROM node:20                #layer1
 WORKDIR /usr/src/app        #layer2
 COPY . .                    #layer3
 RUN npm install             #layer4
 EXPOSE 3000
 CMD ["node", "index.js"]

When building Docker images, each command in the Dockerfile creates a new layer. Docker caches these layers to speed up future builds. However, if one layer changes, all layers after it must be rebuilt.

Why layers?

caching
Re-using layers
Faster build time

Problem: Layer Dependency in Docker Images
Layer 3: The COPY . . command copies your entire project into the container, depending on your project files.
Issue: If you update any files (like index.js), Docker detects this change and rebuilds Layer 3 and all layers after it, such as RUN npm install. This can slow down the build, especially if later steps are time-consuming.

# Solution to the above-mentioned problem statement

 FROM node:20                #layer1
 WORKDIR /usr/src/app        #layer2
 COPY package *.             #layer3
 RUN npm install             #layer5
 COPY . .                    #layer4
 EXPOSE 3000
 CMD ["node", "index.js"]

How Reordering Solves the Problem

Layer 1: FROM node:20
- Base Image: Sets up the environment. Rarely changes, so it’s cached.
Layer 2: WORKDIR /usr/src/app
- Working Directory: Stable and rarely changes.
Layer 3: COPY package*.json ./
- Copy Dependencies: Copies package.json. Rebuilt only if dependencies change.
Layer 4: RUN npm install
- Install Dependencies: Installs Node.js packages. Cached unless dependencies change.
Layer 5: COPY . .
- Copy Project Files: Copies the rest of the project. Rebuilt only if files change.

Benefits:

Faster Rebuilds: Only the final layer (COPY . .) rebuilds on code changes.
Dependency Isolation: Keeps npm install cached unless package.json changes.

Volumes & Networks

Docker is used to run DBs/Redis/Auxiliary services locally.
This is useful when we don't want to pollute our filesystem with unnecessary dependencies.
We can bring up or bring down those services to clean our machine.

There is a problem
-We want the local databases to retain information across restarts(can be achieved using volumes.).
-We want to allow one docker container to talk to another docker container(can be achieved using networks.).

we shall discuss this further:-

Volumes :

Used for persisting data across starts.
Specifically useful for things like a database.

docker volume create volume_db

docker run -v volume_db:/data/db -p 27017:27017 mongo

docker run -v volume_name:/data/db -p 27017:27017 mongo

-Purpose: Runs a MongoDB container.

-Volume: Mounts the volume_name volume to store MongoDB data at /data/db inside the container.

-Port: Maps port 27017 on the host to port 27017 in the container, allowing access to MongoDB from the host machine.

Networks

Each container has its own local host. So we'll need to form a network for the containers to communicate.
Containers have their own network
One container can't talk to the host machine or other containers

docker network create my-custom-network

docker run -p 3000:3000 --name backend --network my-custom-network <image_tag>

docker run -v volume_name:/data/db --name mongo --network my-custom-network -p 27017:27017 mongo

Multi-Stage Builds

What if we want to allow the development backend to hot reload?
But the production environment to not?

Hot Reloading: Ensure your npm run dev script in package.json uses a tool like nodemon for hot reloading.

FROM node:20 AS Base
WORKDIR /usr/src/app
COPY . .
RUN npm install

FROM Base AS development
COPY . .
CMD ["npm", "run", "dev"]

FROM Base AS production
COPY . .
RUN npm prune --production 
CMD ["npm", "run", "start"]

while building dev:-

docker build . --target development -t tag-name:dev
docker run -e MONGO_URI=mongodb://127.0.0.1:27017/my_db -p 3000:3000 -v .:/usr/src/app myapp:dev

while building prod

docker build . --target production -t tag-name:dev
docker run -e MONGO_URI=mongodb://127.0.0.1:27017/my_db -p 3000:3000 -v .:/usr/src/app myapp:dev

Docker Compose & YAML Files

Docker Compose

Docker Compose is a tool for defining and running multi-container Docker applications. With Docker Compose, you can use a YAML file to configure your application’s services, networks, and volumes. Then, with a single command, you can create and start all the services from your configuration.

key commands:-

Start services: docker compose up
Stop and remove services: docker compose down
View logs: docker compose logs
List services: docker compose ps

Example for a docker-compose.yml file:-

version: '3'
services:
  web:
    build: .
    ports:
      - "3000:3000"
    networks:
      - frontend
      - backend
    depends_on:
      - db
    environment:
      DB_HOST: db
      DB_PORT: 5432
      REDIS_HOST: redis
      REDIS_PORT: 6379
  db:
    image: postgres
    volumes:
      - db_data:/var/lib/postgresql/data
    networks:
      - backend
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: myuser
      POSTGRES_PASSWORD: mypassword
  redis:
    image: redis
    networks:
      - backend
  nginx:
    image: nginx
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    networks:
      - frontend

volumes:
  db_data:

networks:
  frontend:
  backend:

DEV Community