Kostas Kalafatis

Posted on Jun 24

Getting Started with Dockerfiles

#beginners #docker #devops #tutorial

Introduction

In the previous posts, we discussed how you can run your first Docker container by pulling pre-built Docker images from Docker Hub. While it is useful to get pre-built Docker images from Docker Hub, we can't only rely on them. This is important for running our applications on Docker by installing new packages and customizing the settings of the pre-built Docker images.

This will be done using a text file called a Dockerfile. This file consists of commands that can be executed by Docker to create a docker image. Docker images are created from a Dockerfile using the docker build or docker image build command.

A Docker image consists of multiple layers, each layer representing commands provided in the Dockerfile. These read-only layers are stacked on top on one another to create the final Docker image. Docker images can be stored in a Docker registry, such as Docker Hub, which stores and distributes Docker images.

A Docker container is a running instance of the Docker image. One or more Docker containers can be created from a single Docker image using the docker run or docker container run command. Once a Docker container is created from an image, a new writable layer will be added on top of the read-only layers from the image.

There can be one or more read-only layers that make up a Docker image. These read-only layers are generated for each command in the Dockerfile during the Docker image building process. Once the container is created, a new read-write layer (known as the Container layer) will be added on top of the image layers and will host all changes made on the running container.

What is a Dockerfile?

A Dockerfile is a text file that contains instructions on how to create a Docker image. These commands are known as directives. A Dockerfile is a way of creating a custom Docker image based on our requirements.

The format of a Dockerfile is as follows:

# This is a comment
DIRECTIVE argument

A Dockerfile can contain multiple lines of comments and directives. These lines will be executed in order by the Docker Engine while building the Docker image. A Dockerfile can also contain comments.

All statements starting with the # symbol are treated as comments. Currently, Dockerfiles only support single-line comments.

Instructions within a Dockerfile are case-insensitive. Even though the DIRECTIVE is case-insensitive, it is considered a best practice to write all directives in uppercase to distinguish them from arguments.

Common Dockerfile Directives

A directive is a command that is used to create a Docker image. In this section we are going to discuss the following five basic Dockerfile directives:

The FROM directive.
The LABEL directive.
The RUN directive.
The CMD directive.
The ENTRYPOINT directive.

The FROM Directive

A Dockerfile generally starts with a FROM directive. This is used to specify the parent image of our custom Docker image. The parent image is our starting points. All the customization that we do will be applied on top of the parent image. The parent image can be an image from Docker Hub, such as Ubuntu, or Nginx. The FROM directive takes a valid image name and a tag as arguments. If the tag is not specified, the latest tag will be used.

A FROM directive has the following format:

FROM <image>:<tag>

The following FROM directive, uses the ubuntu parent image, with the 20.04 tag:

FROM ubuntu:20.04

We can also use a special base image if we need to build a Docker image from scratch. The base image, known as the scratch image, is an empty image mostly used to build other parent images.

In the following FROM directive, we are going to use the scratch image to build a custom Docker image from scratch:

FROM scratch

The LABEL Directive

A LABEL is a key-value pair that can be used to add metadata to a Docker image. These labels can be used to organize the Docker images properly. Usually this includes the name of the author, or the version of the Dockerfile.

A LABEL directive has the following format:

LABEL <key>=<value>

A Dockerfile can have multiple labels:

LABEL maintainer=somerandomguy@somerandomdomain.com
LABEL version=1.0
LABEL environment=dev

Or you can write it as an one liner separated by spaces:

LABEL maintainer=somerandomguy@somerandomdomain.com version=1.0 environment=dev

I prefer one LABEL directive per key-value pair, but each to their own I guess.

Labels can be viewed using the docker image inspect command:

docker image inspect ubuntu:latest

...
 "Labels": {
                "org.opencontainers.image.ref.name": "ubuntu",
                "org.opencontainers.image.version": "24.04"
            }
...

The RUN Directive

The RUN directive is used to execute commands during the image build time. This will create a new layer on top of the existing layer, execute the specified command, and commit the results to the newly created layer. The RUN directive can be used to install the required packages, create users and groups, and so on.

The RUN directive takes the following format:

RUN <command>

<command> specifies the shell command you want to execute as part of the image build process. A Dockerfile can have multiple RUN directives adhering to the preceding format.

Below, we are running three commands on top of the parent image.

The apt-get update command will update the list of available packages and their versions, but it does not install or upgrade any packages. It ensures that the package manager has the latest information about available software.

The apt-get upgrade command actually installs the newest versions of all packages currently installed on the system from the sources enumerated in the sources list. New packages will be installed if required. It will not remove any packages.

The apt-get install nginx -y will install the nginx package, a high-performance web server and a reverse proxy server. The -y flag automatically answers "yes" to any prompts, ensuring that the installation proceeds without user intervention.

RUN apt-get update
RUN apt-get upgrade
RUN apt-get install nginx -y

Alternatively, you can add multiple shell commands to a single RUN directive by separating them with the && symbol. In the following example, we are going to use the same commands, but this time in a single RUN directive, separated by the && symbol:

RUN apt-get update && apt-get upgrade && apt-get install nginx -y

The CMD Directive

A Docker container is generally expected to run one process. A CMD directive is used to provide this default initialization command that will be executed when a container is created from the Docker image. A Dockerfile can execute only one CMD directive. If you add multiple CMD directives in your Dockerfile, Docker will execute only the last one.

The CMD directive has the following format:

CMD ["executable", "param1", "param2", "param3", ...]

For example, we can use the following command to echo "Hello World" as the output of a Docker container:

CMD ["echo", "Hello World"]

The command will produce the following output, when we run it using the docker container run <image> command

docker container run hello-world-image
Hello world

However, if we send any command line arguments with docker container run <image>, these arguments will override the CMD command we defined.

docker container run hello-world-image echo "Hello Docker"
Hello Docker

So, what is the difference between RUN and CMD?

Both the RUN and CMD directives can be used to execute a shell command. The main difference between the two is that the command provided with the RUN directive will be executed during the image build process, while the command provided with the CMD directive will be executed once a container is launched from the built image.

Another notable difference is that there can be multiple RUN directives in a Dockerfile, but there can be only a single CMD directive. If there are multiple CMD directives, only the last one will be executed.

The ENTRYPOINT Directive

Similar to the CMD directive, the ENTRYPOINT directive can also be used to provide a default initialization command that will be executed when a container is created. The difference between CMD and ENTRYPOINT, is that the ENTRYPOINT command cannot be overridden using command line parameters sent by the docker container run command.

You can override the ENTRYPOINT directive using the --entrypoint flag, with the docker container run.

The ENTRYPOINT directive has the following format:

ENTRYPOINT ["executable", "param1", "param2", "param3", ...]

When both ENTRYPOINT and CMD are used together in a Dockerfile, the CMD directive provides additional arguments to the ENTRYPOINT executable. This combination allows for a more flexible and modular setup.

For example:

ENTRYPOINT ["echo", "Hello"]
CMD ["World"]

The output of the echo command will differ based on how we execute the docker container run command.

If we launch the Docker image without any additional parameters, it will output the message Hello World

docker run test-image
Hello World

But if we provide a command line parameter, the message will change:

docker container run test-image "Docker"
Hello Docker

Creating our First Dockerfile

We are going to create a Docker image that, when run, prints any arguments passed to it preceded by the text "You are reading " If no arguments are provided, it should print "You are reading Awesome Posts on dev.to".

First lets create a new directory named test-docker-image using the mkdir command. This directory will be the context for our Docker image. Context is the directory that contains all the files needed to build the image:

mkdir test-docker-image

Now, navigate to the newly created directory:

cd test-docker-image

Within the test-docker-image directory, create a file named Dockerfile. I am going to use VS Code but feel free to use whatever editor you feel comfortable with.

code Dockerfile

Let's build the contents of our Dockerfile. I will add comments and explain every step as we create the Dockerfile. However, if you prefer to copy the entire content (though I recommend against it), the final Dockerfile will be provided below.

We'll start with the FROM directive to specify our base image. We are going to use the Alpine Linux distribution. Alpine Linux is used because it is a lightweight, security-oriented distribution. Its small size (around 5 MB) reduces the attack surface and download time, making it ideal for building minimal and efficient Docker images.

# Use the lightweight Alpine Linux image as the base image
FROM alpine:latest

Next, let's add some LABEL directives. Adding LABEL directives for maintainer, version, and environment provides essential metadata, aiding in documentation and maintainability. They help identify the image maintainer, track the image version, and specify the intended environment, making it easier to manage and support the image.

# Note that these 3 LABEL directives will only create a single layer
LABEL maintainer="someguy@someorganization.com"
LABEL version="1.0"
LABEL environment="dev"

We are now going to update and upgrade our image OS. Running apk update and apk upgrade in your Docker image ensures that you have the latest package lists and the most recent security patches and bug fixes. This helps keep the image secure and up-to-date with the latest improvements, reducing potential vulnerabilities and improving stability.

RUN apk update
RUN apk upgrade

Next we are going to use the CMD directive to pass the default text after our You are reading message.

CMD ["Awesome posts in dev.to"]

Finally, we are going to add the ENTRYPOINT directive to define the default executable of the container

ENTRYPOINT ["echo", "You are reading"]

The final Dockerfile should look something like the following:

FROM alpine:latest
LABEL maintainer="someguy@someorganization.com"
LABEL version="1.0"
LABEL environment="dev"
RUN apk update && apk upgrade
CMD ["Awesome posts in dev.to"]
ENTRYPOINT ["echo", "You are reading"]

Save, and exit your editor.

In the next post, we'll discuss building a Docker image from a Dockerfile, but for now, let's give our image a try.

Run the following command inside the directory where you created your Dockerfile:

docker image build .

This will build your image. We then need to find what our image is, so run the following

docker image list

You should see a list of docker images stored in your local machine. We are looking for an image with no tag and no repository. This is the image we created:

REPOSITORY                                      TAG       IMAGE ID       CREATED              SIZE
<none>                                          <none>    0b2db1f06f71   About a minute ago   16.5MB

Finally, use the docker run <IMAGE ID> to run our image:

docker run 0b2db1f06f71

It should display the following:

You are reading Awesome posts in dev.to

Now let's pass some arguments. Run the following command to override the CMD argument:

docker run 0b2db1f06f71 "hello world"

It should display the following:

You are reading hello world

Summary

In this post, we explored how to use a Dockerfile to create custom Docker images. We began by explaining what a Dockerfile is and its syntax. We then covered some common Docker directives, such as FROM, LABEL, RUN, CMD, and ENTRYPOINT. Finally, we created our first Dockerfile using the directives we discussed.

In the next post, we are going to take a deep dive in building images through a Dockerfile.

Top comments (2)

Ahmed Atwa • Jul 11

Great post!
Would be helpful to provide some resources to explain CMD vs ENTRYPOINT for those (like me) who would ask what's the difference (in more details).

Kostas Kalafatis • Jul 11

Thank you for your feedback! I thought I added the difference between CMD and ENTRYPOINT but apparently i skipped this part.

In Docker, CMD and ENTRYPOINT are both instructions used in Dockerfiles to define what command should run when a container starts, but they serve slightly different purposes.

The CMD instruction specifies the default command and/or parameters for the container. It can be defined in two forms: as a JSON array or as a string. If multiple CMD instructions are present in a Dockerfile, only the last one takes effect. If no CMD is specified, Docker will use the command from the base image. Importantly, CMD can be overridden by specifying a different command when starting the container with docker run.

For example, imagine that you have the following Dockerfile, that creates the myimage image.

FROM alpine:latest

# Set the default command to execute when the container starts
CMD ["echo", "Hello, World!"]

If no command is provided when starting the container, i.e., when running docker run myimage, the output will be "Hello, World!"

But, if a command is specified at runtime, e.g., docker run myimage echo Hello, Ahmed! then the output will be "Hello, Ahmed!".

On the other hand, ENTRYPOINT sets the main command and parameters that will be executed when a container runs. Like CMD, it can be defined as a JSON array or a string. If multiple ENTRYPOINT instructions exist, only the last one is effective. If no ENTRYPOINT is provided, Docker uses the default entry point from the base image. Unlike CMD, ENTRYPOINT does not get overridden when a command is specified at runtime with docker run; instead, additional parameters passed during runtime are treated as arguments to the ENTRYPOINT command.

For example, if you have the following Dockerfile and build again the myimage image

FROM python:3.9-slim

# Set the main executable to run when the container starts
ENTRYPOINT ["python", "app.py"]

If you run the container without any argument, i.e., docker run myimage, then the app.py will be executed.

But, if a command is specified at runtime, e.g., docker run myimage echo Hello, Ahmed! echo will be treated as the argument name, and Hello, Ahmed as the argument value, that will be sent to the app.py.

For flexibility and best practices, it's common to use both ENTRYPOINT and CMD together in a Dockerfile. ENTRYPOINT defines the main executable or script that serves as the container's primary process, while CMD provides default arguments or options for that executable. This approach allows for a balance of consistency in defining the container's main functionality (ENTRYPOINT) and flexibility in customizing its behavior at runtime (CMD).

For example,

FROM node:14-alpine

# Set the main entrypoint script
ENTRYPOINT ["node", "server.js"]

# Provide some default arguments for the entrypoint
CMD ["--port", "8080"]

In this example, ENTRYPOINT sets node server.js as the main executable. CMD provides some default arguments (--port 8080) to be used by server.js. When you start the container without any additional arguments, it runs node server.js --port 8080. You can override CMD by specifying a different command, but ENTRYPOINT will always remain node server.js.

DEV Community