Why Containerisation?
- Everyone has a different operating system
- steps to run a project can vary based on the operating system
- Extremely harder to keep track of dependencies as a project grows
- What if there was a way to describe your project's configuration in a single file
- What if that could be run in an isolated environment
- Makes local setup of OS projects a breeze
- Makes installing auxiliary services very simple
Definition
Containerization involves building self-sufficient software packages that perform consistently, regardless of the machines they run on.
It's taking the snapshot of a machine, the filesystem, and letting you use and deploy it as a construct.
note:- Allows for container orchestration which makes deployment a breeze.
Docker has 3 parts
-
CLI
- The CLI is where Docker commands are executed.
-
Engine
- Docker Engine is the heart of Docker and is responsible for running and managing containers. It includes:
- Docker Daemon: Runs on the host, managing images, containers, networks, and storage.
- Docker Engine is the heart of Docker and is responsible for running and managing containers. It includes:
-
Registry
- Docker Registry is a system for storing and sharing Docker images. It can be public or private, allowing users to upload and download images for easy collaboration and deployment.
- Docker Hub: The default public registry with millions of images.
- Private Registries: Custom, secure repositories organizations use to control their images.
- Docker Registry is a system for storing and sharing Docker images. It can be public or private, allowing users to upload and download images for easy collaboration and deployment.
Images v/s Containers
A docker image behaves like a template from which consistent containers can be created.
if docker was a traditional virtual machine, the image could be likened to the ISO used to install your VM. This isn't a robust comparison, as Docker differs from VMs in concept and implementation, but it's a useful starting point nonetheless.
Images define the initial filesystem state of new containers. They bundle your application's source code and its dependencies into a self-contained package runtime. Within the image, filesystem content is represented as multiple independent layers.
How to Containerize an App
Below is an example of a simple Dockerfile for a Node.js backend application:
# Use Node.js version 20 as the base image
FROM node:20
# Set up a working directory inside the container
WORKDIR /usr/src/app
# Copy the contents of the current directory to the working directory in the container
COPY . .
# Install dependencies specified in package.json
RUN npm install
# Expose the container's port 3000 to the host machine
EXPOSE 3000
# Define the command to run the application when the container starts
CMD ["node", "index.js"]
the first four lines of code i.e.,
FROM node:20
WORKDIR /usr/src/app
COPY . .
RUN npm install
these run while the image is being created but the line
CMD ["node", "index.js"]
executes only while the container starts. The expose 3000 is only exposing a port so we won't be considering that.
Build and Run the Docker Image
#Build the Docker image
docker build -t my-node-app .
#Run the Docker container
docker run -p 3000:3000 my-node-app
docker build -t my-node-app .
- here the -t flag signifies the tag-name
docker run -p 3000:3000 my-node-app
- all the requests coming to my machine at port 3000 should be routed to port 3000 of the container
Caching and Layers
FROM node:20 #layer1
WORKDIR /usr/src/app #layer2
COPY . . #layer3
RUN npm install #layer4
EXPOSE 3000
CMD ["node", "index.js"]
When building Docker images, each command in the Dockerfile creates a new layer. Docker caches these layers to speed up future builds. However, if one layer changes, all layers after it must be rebuilt.
Why layers?
- caching
- Re-using layers
- Faster build time
Problem: Layer Dependency in Docker Images
Layer 3: The COPY . . command copies your entire project into the container, depending on your project files.
Issue: If you update any files (like index.js), Docker detects this change and rebuilds Layer 3 and all layers after it, such as RUN npm install. This can slow down the build, especially if later steps are time-consuming.
# Solution to the above-mentioned problem statement
FROM node:20 #layer1
WORKDIR /usr/src/app #layer2
COPY package *. #layer3
RUN npm install #layer5
COPY . . #layer4
EXPOSE 3000
CMD ["node", "index.js"]
How Reordering Solves the Problem
-
Layer 1:
FROM node:20
- Base Image: Sets up the environment. Rarely changes, so it’s cached.
-
Layer 2:
WORKDIR /usr/src/app
- Working Directory: Stable and rarely changes.
-
Layer 3:
COPY package*.json ./
-
Copy Dependencies: Copies
package.json
. Rebuilt only if dependencies change.
-
Copy Dependencies: Copies
-
Layer 4:
RUN npm install
- Install Dependencies: Installs Node.js packages. Cached unless dependencies change.
-
Layer 5:
COPY . .
- Copy Project Files: Copies the rest of the project. Rebuilt only if files change.
Benefits:
-
Faster Rebuilds: Only the final layer (
COPY . .
) rebuilds on code changes. -
Dependency Isolation: Keeps
npm install
cached unlesspackage.json
changes.
Volumes & Networks
- Docker is used to run DBs/Redis/Auxiliary services locally.
- This is useful when we don't want to pollute our filesystem with unnecessary dependencies.
- We can bring up or bring down those services to clean our machine.
There is a problem
-We want the local databases to retain information across restarts(can be achieved using volumes.).
-We want to allow one docker container to talk to another docker container(can be achieved using networks.).
we shall discuss this further:-
Volumes :
- Used for persisting data across starts.
- Specifically useful for things like a database.
docker volume create volume_db
docker run -v volume_db:/data/db -p 27017:27017 mongo
docker run -v volume_name:/data/db -p 27017:27017 mongo
-Purpose: Runs a MongoDB container.
-Volume: Mounts the volume_name volume to store MongoDB data at /data/db inside the container.
-Port: Maps port 27017 on the host to port 27017 in the container, allowing access to MongoDB from the host machine.
Networks
Each container has its own local host. So we'll need to form a network for the containers to communicate.
Containers have their own network
One container can't talk to the host machine or other containers
docker network create my-custom-network
docker run -p 3000:3000 --name backend --network my-custom-network <image_tag>
docker run -v volume_name:/data/db --name mongo --network my-custom-network -p 27017:27017 mongo
Multi-Stage Builds
What if we want to allow the development backend to hot reload?
But the production environment to not?
Hot Reloading: Ensure your npm run dev script in package.json uses a tool like nodemon for hot reloading.
FROM node:20 AS Base
WORKDIR /usr/src/app
COPY . .
RUN npm install
FROM Base AS development
COPY . .
CMD ["npm", "run", "dev"]
FROM Base AS production
COPY . .
RUN npm prune --production
CMD ["npm", "run", "start"]
while building dev:-
docker build . --target development -t tag-name:dev
docker run -e MONGO_URI=mongodb://127.0.0.1:27017/my_db -p 3000:3000 -v .:/usr/src/app myapp:dev
while building prod
docker build . --target production -t tag-name:dev
docker run -e MONGO_URI=mongodb://127.0.0.1:27017/my_db -p 3000:3000 -v .:/usr/src/app myapp:dev
Docker Compose & YAML Files
Docker Compose
Docker Compose is a tool for defining and running multi-container Docker applications. With Docker Compose, you can use a YAML file to configure your application’s services, networks, and volumes. Then, with a single command, you can create and start all the services from your configuration.
key commands:-
- Start services: docker compose up
- Stop and remove services: docker compose down
- View logs: docker compose logs
- List services: docker compose ps
Example for a docker-compose.yml file:-
version: '3'
services:
web:
build: .
ports:
- "3000:3000"
networks:
- frontend
- backend
depends_on:
- db
environment:
DB_HOST: db
DB_PORT: 5432
REDIS_HOST: redis
REDIS_PORT: 6379
db:
image: postgres
volumes:
- db_data:/var/lib/postgresql/data
networks:
- backend
environment:
POSTGRES_DB: mydb
POSTGRES_USER: myuser
POSTGRES_PASSWORD: mypassword
redis:
image: redis
networks:
- backend
nginx:
image: nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
networks:
- frontend
volumes:
db_data:
networks:
frontend:
backend:
Top comments (0)