Follow me on Twitter, happy to take your suggestions on topics or improvements /Chris
A Deeper look at the Dockerfile
Ok, so you know your way around Docker. You might have picked it up in my 5 part Docker series or somewhere else. Regardless you are at a point where you go from understanding the basics to doing it better. That's what this article is showing you, how you can improve existing fundamentals knowledge on Dockerfiles in particular
Resources
- Best practices on Dockerfiles There is a long list of tips in here. Sooner or later you want to have a look here and improve your set up.
- Push your Docker images to a container registry in the Cloud Your Docker images will need to be stored somewhere either at Docker HUb, a private registry that only you and your colleagues can access or why not a private registry in the Cloud.
What we know about Dockerfile
We know that the Dockerfile is like a recipe file where we can specify things like the OS image to base it on, what libraries should be installed, environment variables, commands we want to run and much more. Everything is there, specified in the file, it's super clear what you are getting. It's a really great advancement from the days where things just worked on our machine or when we spent hours or days installing things - It's progress.
Our Dockerfile sample
We've created a Dockerfile to give you an idea of what it can look like. Let's discuss the various parts of the file to better understand it. Here goes:
// Dockerfile
FROM node:latest
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
ENTRYPOINT ["node", "app.js"]
This is a pretty typical looking file. We select an OS image, set a working directory, copy the files we need, install some libraries, opens up a port and finally runs the applications. So what's wrong with that?
OS image size
At first glance, everything looks the way we expect but at a close look, we can see that we are using node:latest
as an image. Let's try to build this into a Docker image with the command:
docker build -t optimize/node .
Ok, let's now run docker images
to see our image and get some more stats on it:
It weighs in at 899 MB
Ok, we have nothing to compare with but let's change the image to one called node:alpine
and rebuild our image:
77.7 MB, WOW!!! That's a huge difference, our Docker image is ten times smaller. Why is that?
This image is based on the Alpine Linux Project
in general the Alpine Linux images are much smaller than normal distributions. It comes with some limitations, have a read here. In general it's a safe choice though.
The cache
For every command you specify in the Dockerfile it creates another image layer. What Docker does, however, is to first check the cache to see whether an existing layer can be reused before trying to create one.
When we come to instructions like ADD and COPY we should know how they operate in the context of the cache. For both of these commands, Docker calculates a checksum for each file and stores that in the cache. Upon a new build of the Docker images, each checksum is compared and if it differs, due to a change in the file, it recalculates the checksum and carries out the command. At this point, it creates a new image layer.
Order matters
The way Docker operates is to try to reuse as much as possible. The best thing we can do is to place the instructions, in the Dockerfile, from the least likely to change to the most likely to change.
What does that mean?
Let's look at the top of our Dockerfile:
FROM node:alpine
WORKDIR /app
Here we can see that the FROM command happens first followed by WORKDIR. Both these commands are not likely to change os they are correctly placed at the top.
What is likely to change though?
Well, you are building an application so the source files of your app, or libraries you realize you might suddenly need, like a npm install
, makes sense to place as further down in the file.
What do we gain by doing this?
Speed, we gain speed when we build our Docker image and we've placed the commands as efficiently as possible. So in summary ADD, COPY, RUn are commands that should happen later in the Dockerfile.
Minimize the layers
Every command you enter creates a new image layer. Ensure you keep the number of commands to a minimum. Group them if you can. Instead of writing:
RUN command
RUN command2
Organize them like so:
RUN command && \
command2
Include only what you need
When you build an app. It easily consist of a ton of files but when it comes to what you actually need to create your Docker image it ends up being a smaller number of files. If you create a .dockerignore
file you can define patterns that ensure that when we include files, we only get the ones we need, for our container.
Define a start script
Wether you use the command CMD or ENTRYPOINT, you should NOT call the application directly like so node app.js
. Instead, try to define a starter script like this npm start
.
Why you ask?
We want to make sure we are flexible and unlikely to change this instruction. We might actually end up changing how we start our app by us gradually adding flags to it like so node app.js --env=dev --seed=true
. You get the idea, it's a moving target potentially. However by us relying on npm start
, a startup script, we get something more flexible.
Use LABEL
Using the command LABEL is a great way to describe your Dockerfile better. You could use it to organize the files, help with automation and potential use cases, you know best what information makes sense to put there, but it exists to support you in bringing order to all your images so leverage it to your advantage. A labels value is a key-value pair like so LABEL [key]-[value]
. Every label command can have multiple labels. In fact that it's considered to collect all your labels under one label command. You can do so by separating each key-value pair with a space character or like so:
LABEL key=value \
key2=value2
Rely on default ports with EXPOSE
EXPOSE is what you use to open up ports on the container. To ensure we can talk to the container on that port we can use the -p
command in conjunction with Docker run docker run -p [external]: [exposed docker port]
. It's considered best practice to set the exposed port to the default ports used by what you are using like port 80 for an apache server and 27017 if you have a Mongo DB database etc.
Be explicit, use COPY over ADD
At first glance it looks like COPY and ADD does the same thing but there is a difference. ADD is able to extract TAR files as well, which COPY can't do. So be explicit and use COPY when you mean to copy files and ensure to only use ADD when you mean to use something feature specific like the mentioned TAR extraction.
Summary
There are many more best practices to follow when it comes to Dockerfile but the biggest gain I've mentioned throughout this post is the one on using the smallest image possible like alpine. It can make wonders for your image size, especially if the storage size is something you pay for.
Have a read in Dockerfile best practices docs for more great tips
Top comments (8)
Cache misses are more likely to happen when grouping
RUN
commands. If it inflates the end image too much, multi-stage images can be used to have a more bare final image without all the build stages.Edit: otherwise, I'd say these are all sensible guidelines.
Specifically in the context of
node
images if you use distroless it will add a few~5-10mbs
in size compared toalpine
but you get more security 🔒; Such as: no package manager tools let alone a shell, etc.You should also remove package managers caches. Not sure about npm, but if you run yarn install && yarn cache clean you reduce that layer size by 50%.
Hi Tim. Appreciate you making the article better. I'll make sure to update it and give you credit :)
The main reason to use a
.dockerignore
file is because the docker command has to send your complete build-context to the docker daemon, which will then build the image.If your context is the root of the repo for instance, it will send your whole project to the daemon before evaluating the
COPY
andADD
commands. This can become expensive when you have a lot of dependencies or store build artifacts in your project..dockerignore
stops the docker CLI from sending specified files and directories to the daemon.Thanks Ryan, appreciate the comment
Awesome, this was really informative, pertinent, and timely for me. Thanks!