Kostas Kalafatis

Posted on Jul 19, 2024

Understanding CMD, ENTRYPOINT and RUN in Docker - An Interlude Post

#beginners #devops #docker #tutorial

Disclaimer: This post is heavily inspired by this post on the Docker Blog. It is also seeks to answer questions raised by the community in previous posts in this series.

Docker is undeniably a powerful tool for containerization, but its versatility and extensive feature set can be overwhelming, especially for newcomers. The platform offers multiple ways to achieve similar goals, making it crucial for users to carefully weigh the advantages and disadvantages of each option to determine the best approach for their specific projects. This decision-making process requires a solid understanding of Docker's underlying mechanisms and the trade-offs involved in different strategies.

Docker can be a bit of a maze, especially when it comes to the nitty-gritty of instructions like RUN, CMD, and ENTRYPOINT. These three instructions often cause confusion, as they seem to have overlapping functions. But fear not! In this article, we'll unravel the mystery, clearly explaining the differences between them and highlighting when to use each one effectively. By the end, you'll be a Dockerfile ninja, wielding these instructions with confidence and precision!

The RUN Directive

In Docker, the RUN directive is a key instruction used in Dockerfiles to run commands while building your Docker image. It's your go-to tool for setting up the environment inside the image, allowing you to install packages, update dependencies, configure settings, and basically do anything you need to make your image ready to use. Think of it as the stage where you get everything in place before the curtain goes up on your running container.

Here is an example of the RUN directive in practice:

FROM ubuntu:20.04

# Update package lists and install necessary packages
RUN apt-get update && apt-get install -y \
    curl \
    wget \
    && rm -rf /var/lib/apt/lists/*

In this example, we use the RUN instruction to update the package lists on your system and install curl and wget. These are handy tools for downloading files and interacting with websites from the command line.

The && rm -rf /var/lib/apt/lists/* part is added for cleanup. It removes the cached package lists after the installation, making your final Docker image smaller and leaner. Since these lists aren't needed to run your application, it's good practice to get rid of them.

The CMD Directive

The CMD instruction in your Dockerfile sets the default command that will run when you start a container from your image. It's like a pre-set option for what your container should do when it "boots up." But here's the key: you can easily change this default behavior by providing different command-line arguments when you run the container using docker run.

CMD is perfect for situations where you want to provide a sensible default behavior for your container, but also give users the flexibility to customize it. Think of it as setting up a suggested starting point, but allowing users to take the wheel if they want to go in a different direction.

It's a common practice to use CMD in Docker images to define default parameters or configurations that can be easily overridden by the user when running the container.

For example, by default, you might want to start a web server to start, but also allow users to override this and run a shell instead:

FROM node:alpine

[...dockerfile truncated...]

CMD ["node", "server.js"]

The ENTRYPOINT Directive

The ENTRYPOINT instruction sets the main executable for your container. It's similar to CMD but with a key difference: when you run a container with docker run, the command you provide doesn't replace the ENTRYPOINT command. Instead, your command gets added to the ENTRYPOINT command as arguments.

Think of it like this: ENTRYPOINT sets the core command your container is designed to run, while any arguments you provide with docker run become extra instructions for that command.

For example, the following Dockerfile will run the webserver no matter what the users provide as arguments:

FROM node:alpine

[...dockerfile truncated...]

ENTRYPOINT ["node", "server.js"]

So CMD or ENTRYPOINT?

CMD and ENTRYPOINT are special instructions in a Dockerfile. While other instructions execute when you build the image, these two come into play when you actually run a container from that image.

Essentially, when a container starts, Docker needs to know what it should do – what program to run, and how to run it. That's where CMD and ENTRYPOINT step in. They tell Docker what the main process inside the container is and how to start it.

Now, the difference between them is a bit tricky, and many people don't fully grasp it. Luckily, most of the time, your container will work fine even if you don't use them perfectly. However, understanding the nuances can make things a lot smoother and less confusing.

To get a better handle on this, let's break down a typical Linux command:

ping -c 10 127.0.0.1

Here, ping is the command itself, and the rest (-c 10 127.0.0.1) are the parameters or arguments we're giving to that command.

Now, back to Docker:

ENTRYPOINT: This is where you define the command part of the expression. It's the core thing you want your container to do when it starts.
CMD: This is where you define the parameters for that command. They're the additional instructions you want to give it.

So, a Dockerfile that uses Alpine Linux as the base image and wants to run the ping command could look like this:

FROM alpine:latest

ENTRYPOINT ["ping"]
CMD ["-c", "10", "127.0.0.1"]

We can now build an image called pinger from the preceding Dockerfile, as follows:

docker image build -t pinger .

Now we can run a container from the pinger image we just created like this:

docker container run --rm -it pinger

PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.047 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.056 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.038 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: seq=4 ttl=64 time=0.037 ms
64 bytes from 127.0.0.1: seq=5 ttl=64 time=0.036 ms
64 bytes from 127.0.0.1: seq=6 ttl=64 time=0.053 ms
64 bytes from 127.0.0.1: seq=7 ttl=64 time=0.058 ms
64 bytes from 127.0.0.1: seq=8 ttl=64 time=0.048 ms
64 bytes from 127.0.0.1: seq=9 ttl=64 time=0.053 ms

--- 127.0.0.1 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0.036/0.048/0.058 ms

The great thing about this setup is that you can easily override the default CMD parameters you've set in the Dockerfile. If you remember, we originally defined CMD ["-c", "10", "127.0.0.1"] to ping a specific address three times.

But now, when you create a new container, you can simply add different values at the end of your docker run command to change the ping target or the number of pings. This gives you a lot of flexibility while still keeping a consistent base command.

docker container run --rm -it pinger -w 5 127.0.0.1

PING 127.0.0.1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=64 time=0.034 ms
64 bytes from 127.0.0.1: seq=1 ttl=64 time=0.053 ms
64 bytes from 127.0.0.1: seq=2 ttl=64 time=0.040 ms
64 bytes from 127.0.0.1: seq=3 ttl=64 time=0.038 ms
64 bytes from 127.0.0.1: seq=4 ttl=64 time=0.056 ms

This will cause the container to ping the loopback IP address (127.0.0.1) for 5 seconds.

Directive Description and Use Cases

The following table provides an overview of these commands and use cases.

Directive	Description	Syntax Example	Use Cases
`RUN`	Executes commands during the image build process. It is used to install software packages, configure settings, or perform other setup tasks. The result is part of the image layers.	`RUN apt-get update && apt-get install -y nginx`	- Installing software packages - Configuring files or repositories - Setting up environment variables or system settings
`CMD`	Specifies the default command to run when a container starts from the image. It can be overridden by providing a command in `docker run`. It is generally used to provide default arguments to `ENTRYPOINT` or to set a default command.	`CMD ["nginx", "-g", "daemon off;"]`	- Setting a default command for the container - Providing default arguments to `ENTRYPOINT` - Running a service or application by default
`ENTRYPOINT`	Defines the main command that is always executed when a container starts. It is not overridden by arguments provided in `docker run` unless you use the `--entrypoint` flag. Useful for ensuring a specific command is always run.	`ENTRYPOINT ["nginx", "-g", "daemon off;"]`	Ensuring a specific command is always executed - Running a primary process or service - Used in combination with `CMD` to provide default arguments

Unix Signaling and the PID 1

In the world of Unix-like systems, including Docker containers, PID 1 is a special process – the very first one that starts up. Every other process inside the system is its child, creating a family tree of processes with PID 1 at the top.

In Docker, this PID 1 process is really important because it's responsible for managing everything else inside the container. One of its critical roles is handling signals (like SIGTERM) from the host system, which are essentially messages that tell the container to do something (like gracefully shut down).

Now, when you use the shell form for a Docker command, a shell process (usually /bin/sh -c) takes over as PID 1. The problem is that this shell process isn't very good at handling signals. It might not pass them along to your actual application, which can lead to issues like containers not shutting down cleanly.

On the other hand, the exec form lets you run your command directly as PID 1, without any shell in between. This way, the command itself receives and handles signals directly, making the whole thing much more reliable.

So, if your container needs to react to signals promptly and gracefully, the exec form is the way to go. It's particularly crucial for applications that need to respond to events or interruptions, ensuring that everything shuts down properly and your data stays safe.

Shell and exec forms

When you're working with Dockerfiles and setting commands for RUN, CMD, and ENTRYPOINT, you have two ways to do it: the shell form and the exec form. Each has its own strengths and weaknesses, and understanding the difference is key to effectively managing your Docker containers.

Shell Form

The shell form is like writing commands the old-school way, similar to what you'd type in a terminal. When Docker sees a command in shell form, it essentially runs it through a shell (usually /bin/sh -c for Unix-based images). This means the command gets interpreted by the shell, which unlocks some handy features like using environment variables and chaining commands together.

CMD echo "Hello, World!"

This way of writing commands, using the shell form, means that Docker executes the command echo "Hello, World!" through the default shell of your container. This can be handy for simple commands or when you need to use special shell features. However, it has a drawback: it doesn't handle signals (like those used for managing processes) the same way as the exec form.

In other words, if your command needs to respond properly to Unix signals for things like stopping or restarting processes gracefully, the shell form might not be the best choice. It's a bit like having a middleman who might not deliver your messages as reliably as you'd like.

Exec Form

Now, let's talk about the exec form. It's a more specific and reliable way to define commands in your Dockerfile. Instead of writing the whole command as a single line, you break it up into a JSON array, where each part of the command (the command itself and its arguments) is a separate element in the array.

CMD ["echo", "Hello, World!"]

Here, instead of using a shell as a middleman, Docker directly executes the command echo with the argument "Hello, World!". This direct execution method is more reliable because it doesn't depend on the shell's interpretation.

This has a few key advantages:

Better Signal Handling: The command itself can receive and respond to Unix signals directly. This is crucial for ensuring your application can gracefully shut down or restart when needed.
Precise Argument Passing: You can be confident that the arguments you provide are passed directly to your command without any potential modifications or interference from the shell.

Key Differences between Shell and Exec

	Shell Form	Exec Form
Form	Commands without square brackets (`[]`). Run by the container's shell	Commands with square brackets (`[]`). Run directly, not through a shell
Variable Substitution	Inherits environment variables from the shell, such as `$HOME` and `$PATH`	Does not inherit shell environment variables but behaves the same for `ENV` instruction variables
Shell Features	Supports sub-commands, piping output, chaining commands, I/O redirection, etc.	Does not support shell features
Signal Trapping & Forwarding	Most shells do not forward process signals to child processes.	Directly traps and forwards signals like `SIGINT`
Usage with ENTRYPOINT	Can cause issues with signal forwarding	Recommended due to better signal handling
Usage as ENTRYPOINT Params	Not possible with the shell form	If the first item in the array is not a command, all items are used as parameters for the `ENTRYPOINT`

The diagram below, will help you decide if you need RUN, CMD or ENTRYPOINT in your Dockerfile.

The diagram below, will help you decide if you need shell or exec form for your commands.

You can find high resolution diagrams here.

Running CMD and ENTRYPOINT

The following examples will walk you through the high-level differences between CMD and ENTRYPOINT, through an example

First lets create our Dockerfile

# Use the Ubuntu 20.04 image as the base image
FROM ubuntu:20.04

# Update the image and install traceroute
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y traceroute

# Set the default command
CMD traceroute

Then build your image using the docker build -t tracert command.

Run the container image with `CMD traceroute`

Without passing any arguments, we get the following output.

docker run tracert

Usage:
  traceroute [ -46dFITnreAUDV ] [ -f first_ttl ] [ -g gate,... ] [ -i device ] [ -m max_ttl ] [ -N squeries ] [ -p port ] [ -t tos ] [ -l flow_label ] [ -w MAX,HERE,NEAR ] [ -q nqueries ] [ -s src_addr ] [ -z sendwait ] [ --fwmark=num ] host [ packetlen ]
Options:
  -4                          Use IPv4
  -6                          Use IPv6
  -d  --debug                 Enable socket level debugging
  -F  --dont-fragment         Do not fragment packets
  -f first_ttl  --first=first_ttl
                              Start from the first_ttl hop (instead from 1)
  -g gate,...  --gateway=gate,...
                              Route packets through the specified gateway
                              (maximum 8 for IPv4 and 127 for IPv6)
  -I  --icmp                  Use ICMP ECHO for tracerouting
  -T  --tcp                   Use TCP SYN for tracerouting (default port is 80)
  -i device  --interface=device
                              Specify a network interface to operate with
  -m max_ttl  --max-hops=max_ttl
                              Set the max number of hops (max TTL to be
                              reached). Default is 30
[...output truncated...]

However, if we provide an IP address to test, we will get an error

docker run tracert www.dev.to                                                      docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "www.dev.to": executable file not found in $PATH: unknown.

The problem is that the string you're providing on the command line, www.dev.to, is completely replacing the CMD instruction in your Dockerfile. Since that URL isn't a valid command, it's causing an error.

To fix this, you need to specify the actual command you want to run alongside the URL.

docker run tracert traceroute www.dev.to
traceroute to www.dev.to (104.18.26.242), 30 hops max, 60 byte packets
[...output truncated...]

Run the container image with `ENTRYPOINT traceroute`

In this updated version, we'll make a change to the Dockerfile. We will remove the original CMD instruction and replace it with ENTRYPOINT ["traceroute"]. This means the traceroute command is now the main command that this container will run when it starts.

The ENTRYPOINT instruction works a bit differently than CMD. With ENTRYPOINT, you can't simply override the command by typing something different after docker run. Instead, any extra arguments you add after docker run are treated as arguments for the traceroute command itself.

# Use the Ubuntu 20.04 image as the base image
FROM ubuntu:20.04

# Update the image and install traceroute
RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y traceroute

# Set the default command
ENTRYPOINT ["traceroute"]

Let's see what happens when we try that

docker run tracert www.dev.to
traceroute to www.dev.to (104.18.26.242), 30 hops max, 60 byte packets
[...output truncated...]

The key point here is to use ENTRYPOINT when you want to ensure that a specific executable is always run when your container starts, regardless of any additional commands or arguments the user might provide. It gives you a way to create containers that behave like self-contained tools, with a clearly defined main purpose.

Summary

Deciding when to use RUN, CMD, or ENTRYPOINT, and whether to choose the shell or exec form, demonstrates the level of detail and flexibility Docker offers. Each of these commands plays a distinct role in the Docker ecosystem, influencing how containers are constructed, behave, and interact with their environment.

By carefully selecting the right command and form for each specific situation, developers can create Docker images that are more dependable, secure, and optimized for efficiency. Mastering these Docker commands and their formats is essential for unlocking the full potential of Docker. When these best practices are followed, applications deployed within Docker containers can achieve peak performance across diverse environments, enhancing both development workflows and production deployments.

Top comments (2)

Ahmed Atwa • Aug 3 '24

Great work diving into the PID and the core differences between the different modes instead of just feature comparison. Thanks!

Luigitto Prosciutto • Nov 4 '24

I think you missed the disclaimer that it was heavily inspired on the Docker Blog post. Instead of heavily inspired should have been copy/pasted and change some words here and there to make it look different 🤣

DEV Community

Understanding CMD, ENTRYPOINT and RUN in Docker - An Interlude Post

The RUN Directive

The CMD Directive

The ENTRYPOINT Directive

So CMD or ENTRYPOINT?

Directive Description and Use Cases

Unix Signaling and the PID 1

Shell and exec forms

Shell Form

Exec Form

Key Differences between Shell and Exec

Running CMD and ENTRYPOINT

Run the container image with `CMD traceroute`

Run the container image with `ENTRYPOINT traceroute`

Summary

Top comments (2)

Read next

Understanding the HTTP 431 Error: A Developer's Guide

The Ultimate Steam Web API Guide

What is JSON Merge Patch?

A Developer's Guide to the AliExpress API

The RUN Directive

The CMD Directive

The ENTRYPOINT Directive

So CMD or ENTRYPOINT?

Directive Description and Use Cases

Unix Signaling and the PID 1

Shell and exec forms

Shell Form

Exec Form

Key Differences between Shell and Exec

Running CMD and ENTRYPOINT

Run the container image with CMD traceroute

Run the container image with ENTRYPOINT traceroute

Summary

Read next

Understanding the HTTP 431 Error: A Developer's Guide

The Ultimate Steam Web API Guide

What is JSON Merge Patch?

A Developer's Guide to the AliExpress API

Run the container image with `CMD traceroute`

Run the container image with `ENTRYPOINT traceroute`