Follow me on Twitter, happy to take your suggestions on topics or improvements /Chris
Ok, so you know your way around Docker. You might have picked it up in my 5 part Docker series or somewhere else. Regardless you are at a point where you go from understanding the basics to doing it better. That's what this article is showing you, how you can improve existing fundamentals knowledge on Dockerfiles in particular
- Best practices on Dockerfiles There is a long list of tips in here. Sooner or later you want to have a look here and improve your set up.
- Push your Docker images to a container registry in the Cloud Your Docker images will need to be stored somewhere either at Docker HUb, a private registry that only you and your colleagues can access or why not a private registry in the Cloud.
We know that the Dockerfile is like a recipe file where we can specify things like the OS image to base it on, what libraries should be installed, environment variables, commands we want to run and much more. Everything is there, specified in the file, it's super clear what you are getting. It's a really great advancement from the days where things just worked on our machine or when we spent hours or days installing things - It's progress.
We've created a Dockerfile to give you an idea of what it can look like. Let's discuss the various parts of the file to better understand it. Here goes:
// Dockerfile FROM node:latest WORKDIR /app COPY . . RUN npm install EXPOSE 3000 ENTRYPOINT ["node", "app.js"]
This is a pretty typical looking file. We select an OS image, set a working directory, copy the files we need, install some libraries, opens up a port and finally runs the applications. So what's wrong with that?
At first glance, everything looks the way we expect but at a close look, we can see that we are using
node:latest as an image. Let's try to build this into a Docker image with the command:
docker build -t optimize/node .
Ok, let's now run
docker images to see our image and get some more stats on it:
It weighs in at 899 MB
Ok, we have nothing to compare with but let's change the image to one called
node:alpine and rebuild our image:
This image is based on the Alpine Linux Project
in general the Alpine Linux images are much smaller than normal distributions. It comes with some limitations, have a read here. In general it's a safe choice though.
For every command you specify in the Dockerfile it creates another image layer. What Docker does, however, is to first check the cache to see whether an existing layer can be reused before trying to create one.
When we come to instructions like ADD and COPY we should know how they operate in the context of the cache. For both of these commands, Docker calculates a checksum for each file and stores that in the cache. Upon a new build of the Docker images, each checksum is compared and if it differs, due to a change in the file, it recalculates the checksum and carries out the command. At this point, it creates a new image layer.
The way Docker operates is to try to reuse as much as possible. The best thing we can do is to place the instructions, in the Dockerfile, from the least likely to change to the most likely to change.
What does that mean?
Let's look at the top of our Dockerfile:
FROM node:alpine WORKDIR /app
Here we can see that the FROM command happens first followed by WORKDIR. Both these commands are not likely to change os they are correctly placed at the top.
What is likely to change though?
Well, you are building an application so the source files of your app, or libraries you realize you might suddenly need, like a
npm install, makes sense to place as further down in the file.
What do we gain by doing this?
Speed, we gain speed when we build our Docker image and we've placed the commands as efficiently as possible. So in summary ADD, COPY, RUn are commands that should happen later in the Dockerfile.
Every command you enter creates a new image layer. Ensure you keep the number of commands to a minimum. Group them if you can. Instead of writing:
RUN command RUN command2
Organize them like so:
RUN command && \ command2
When you build an app. It easily consist of a ton of files but when it comes to what you actually need to create your Docker image it ends up being a smaller number of files. If you create a
.dockerignore file you can define patterns that ensure that when we include files, we only get the ones we need, for our container.
Wether you use the command CMD or ENTRYPOINT, you should NOT call the application directly like so
node app.js. Instead, try to define a starter script like this
Why you ask?
We want to make sure we are flexible and unlikely to change this instruction. We might actually end up changing how we start our app by us gradually adding flags to it like so
node app.js --env=dev --seed=true. You get the idea, it's a moving target potentially. However by us relying on
npm start, a startup script, we get something more flexible.
Using the command LABEL is a great way to describe your Dockerfile better. You could use it to organize the files, help with automation and potential use cases, you know best what information makes sense to put there, but it exists to support you in bringing order to all your images so leverage it to your advantage. A labels value is a key-value pair like so
LABEL [key]-[value]. Every label command can have multiple labels. In fact that it's considered to collect all your labels under one label command. You can do so by separating each key-value pair with a space character or like so:
LABEL key=value \ key2=value2
EXPOSE is what you use to open up ports on the container. To ensure we can talk to the container on that port we can use the
-p command in conjunction with Docker run
docker run -p [external]: [exposed docker port]. It's considered best practice to set the exposed port to the default ports used by what you are using like port 80 for an apache server and 27017 if you have a Mongo DB database etc.
At first glance it looks like COPY and ADD does the same thing but there is a difference. ADD is able to extract TAR files as well, which COPY can't do. So be explicit and use COPY when you mean to copy files and ensure to only use ADD when you mean to use something feature specific like the mentioned TAR extraction.
There are many more best practices to follow when it comes to Dockerfile but the biggest gain I've mentioned throughout this post is the one on using the smallest image possible like alpine. It can make wonders for your image size, especially if the storage size is something you pay for.
Have a read in Dockerfile best practices docs for more great tips