Building Images: From Manual Commits to the Dockerfile Revolution

In our previous posts, we learned the commands to manage images and containers. Now, let's dive into image creation. There are two main ways to build an image, and understanding the older method shows you why the modern standard is essential.

1. The Manual Way: Building Images with docker commit
Before Dockerfiles became standard, the most straightforward way to create an image was by modifying a running container and then "committing" those changes to a new image.

A. How docker commit Works
Start a Base Container: Run an image, often with shell access for interaction.

docker run -it --name my_sandbox ubuntu:latest bash

Make Manual Changes: Inside the container, perform installations or configurations. For example, installing Nginx.

apt-get update && apt-get install -y nginx

Exit the Container: Type exit to stop the container (preserving the changes in the container's read-write layer).

Commit the Changes: Use docker commit to save the container's current state as a new, immutable image.

docker commit <container_name> <new_image_name>:<tag>

Example:
docker commit my_sandbox my_custom_nginx:v1.0

B. Upgrading a Committed Image
To "upgrade" this image, you would repeat the process:

Start a new container from your my_custom_nginx:v1.0 image.
Make further manual changes (e.g., update the Nginx configuration).
Commit the changes again, tagging it with v2.0.

docker commit <new_container_name> my_custom_nginx:v2.0

2. The Limitations of docker commit
While docker commit is simple, it quickly becomes unmanageable for production use due to several major drawbacks:

Limitation	Description	Impact
No Traceability	There is no record of the commands that were run inside the container. You don't know why a file exists or how a package was installed.	Makes auditing, debugging, and security checks nearly impossible.
Non-Reproducible	If you delete the image, you have to manually repeat the exact shell commands in the exact order to rebuild it.	Prevents consistent development and deployment across different environments.
Large Images	`docker commit` often captures unnecessary files (like temporary install cache) in the image layer.	Leads to bloated images that are slow to pull and consume more disk space.
Security Risk	You cannot easily verify the contents or history of the image layers.	Increases the risk of hidden vulnerabilities.

3. The Dockerfile Revolution
The Dockerfile was created specifically to eliminate the limitations of docker commit. A Dockerfile is a simple, plain text file that contains a series of instructions (commands) that Docker executes sequentially to build an image.

Why Use a Dockerfile?
Automation: The entire build process is fully automated.

Traceability: Every command is explicitly listed, creating a transparent, auditable history of the image's creation.

Reproducibility: Anyone with the Dockerfile can rebuild the exact same image consistently.

How it Removes the Limitations:
The Dockerfile is the source code for the image. It allows you to automatically generate images that adhere to best practices for size and security, guaranteeing a reproducible and version-controlled build.

4. Essential Dockerfile Instructions (Part 1)
While the full building process is covered in Part 2, let's introduce the core instructions that form the backbone of nearly every Dockerfile.

Instruction	Purpose	Creates Layer?	Example
`FROM`	Specifies the base image for the build (the starting point). Must be the first instruction.	Yes	`FROM node:18-alpine`
`RUN`	Executes any command in a new layer on top of the current image. Used for installing packages, making directories, etc.	Yes	`RUN apk add --no-cache git`
`WORKDIR`	Sets the working directory for any subsequent `RUN`, `CMD`, `ENTRYPOINT`, `COPY`, or `ADD` instructions.	Yes	`WORKDIR /app`
`COPY`	Copies files or directories from the host machine (where the build is running) into the new image filesystem.	Yes	`COPY package.json /app`
`CMD`	Provides the default command for an executing container. This command is typically overwritten when the container starts. Only one `CMD` is allowed.	Yes	`CMD ["node", "server.js"]`
`EXPOSE`	Informs Docker that the container listens on the specified network ports at runtime. It does not actually publish the port.	No	`EXPOSE 8080`

Understanding CMD vs. RUN

RUN executes a command during the image build (e.g., installing software).
CMD executes a command when the container is started (e.g., launching the application).

Small Explanations on Omitted Topics
We haven't fully covered Networking and Volumes, but here is how they relate to the Dockerfile:

Networking: The EXPOSE instruction is the only networking configuration typically done in a Dockerfile. It simply documents which ports the application inside the container uses. The actual port mapping (e.g., -p 8080:80) is done using the docker run command, not the Dockerfile.

Volumes: Volumes are for data persistence and are usually defined using the docker run -v command or Docker Compose. Occasionally, the VOLUME instruction is used in a Dockerfile to mark a mount point, but it's often better practice to manage volumes outside the image.

What's Next?
You now understand the critical necessity of the Dockerfile! In Part 2 of this topic, we will dive into the full build process using the docker build command, cover advanced instructions like ENTRYPOINT and best practices like multi-stage builds, and officially put our image on a registry.

DEV Community

Building Images: From Manual Commits to the Dockerfile Revolution

Top comments (0)