DEV Community

Cover image for Build Docker images faster using build cache
Kyle Galbraith
Kyle Galbraith

Posted on • Originally published at depot.dev

Build Docker images faster using build cache

When working with Docker, the faster we can build an image, the quicker our development workflows and deployment pipelines can be. Docker's build cache, also known as the layer cache, is a powerful tool that can significantly speed up an image build when it can be tapped into across builds. In this post, we'll explore how Docker's build cache works and share strategies for using it effectively to optimize your Dockerfiles & image builds.

Understanding Docker Build Cache

Before we dive into optimizations, let's understand how Docker's build cache works. Each instruction in a Dockerfile creates a layer in the final image. Think of these layers as building blocks, each adding new content on top of the previous layers.

Docker build cache layers

When a layer changes, Docker invalidates the cache for that layer and all subsequent layers, requiring them to be rebuilt. For instance, if you modify a source file in your project, the COPY command will have to run again to reflect those changes in the image, leading to cache invalidation.

How Docker build cache gets invalidated

Tips for efficiently using the Docker build cache

The more we can avoid cache invalidation, or the later we can have our cache invalidate, the faster our Docker image builds can be. Let's explore some strategies for doing just that.

Order your layers wisely

Ordering our commands in a Dockerfile can play a huge role in leveraging the Docker layer cache and how often we invalidate it. Let's take a look at an example:

FROM node:20

WORKDIR /app
COPY . .
RUN npm install
RUN npm build
Enter fullscreen mode Exit fullscreen mode

This is an inefficient Dockerfile. The COPY command will invalidate the cache for all subsequent layers whenever a file changes in our project, forcing our npm install and npm build commands to execute even if none of our dependencies changed. We can improve this by being more thoughtful about when we copy in our source code and install our dependencies:

FROM node:20

WORKDIR /app
COPY package.json package-lock.json /app/
RUN npm install
COPY . .
RUN npm build
Enter fullscreen mode Exit fullscreen mode

We've moved our source code copy to after our npm install command. We copy in our package.json and package-lock.json to install our dependencies. We then copy in our source code and execute our npm build.

This is a small change that can have a significant impact on build time. Now, instead of every source code change forcing us to reinstall our dependencies, we only have to do so when our package.json or package-lock.json files change.

Keep your layers small and focused

The less stuff in our build, the faster our Docker image build can be. By keeping our layers small and focused, we can keep our image smaller, cache smaller, and reduce the number of things that can invalidate the cache.

We've written other posts about keeping Docker images small that are worth reading in conjunction with this post:

Here are a few tips and tricks that are relevant to efficiently using the Docker build cache.

Avoid copying files that are not needed

A common mistake we see is copying in files not needed in the final image. For instance, if we are building a Node.js application, we may inadvertently copy in our node_modules directory when, in fact, we are running npm install again in our Dockerfile.

This is a waste of time and space. A good guiding principle is to only copy in the files and directories we know are needed in our final image. So, if we take our earlier example:

FROM node:20

WORKDIR /app
COPY package.json package-lock.json /app/
RUN npm install
COPY . .
RUN npm build
Enter fullscreen mode Exit fullscreen mode

In addition, our docker build is invoked with the full context of our

.git/
node_modules/
app/
  index.js
  package.json
  package-lock.json
README.md
Dockerfile
Enter fullscreen mode Exit fullscreen mode

Our COPY command is copying in our entire build context; we can easily visualize our build context with our debug build context feature. In this example, we are copying in many unnecessary files and directories like .git, node_modules, and our README.

It is far better to be more specific with our COPY command:

COPY ./app /app
Enter fullscreen mode Exit fullscreen mode

Now, we are only copying the app folder into our build and final image.

Use .dockerignore to exclude files and directories

Sometimes, knowing exactly what files and directories are needed in our final image can be tricky. So we can use a .dockerignore file to explicitly define the files we know should be excluded. For our example above, we could create a .dockerignore file with the following contents:

.git
node_modules
README.md
Enter fullscreen mode Exit fullscreen mode

Avoid unnecessary dependencies from package managers

We commonly install dependencies into our images from package managers like npm, pip, apt, apk, etc. for Node, Python, Debian, and Alpine images. It's important to be mindful of what dependencies we are installing and if they are needed in our final image. There are tricks we can sometimes use like --no-install-recommends to avoid package managers installing additional dependencies that are not needed.

Sometimes dependencies are only needed for building our application, but not for running it; in those cases, we can leverage multi-stage builds to avoid having them in our final image.

Leverage the RUN cache for finer-grained control

Also known as BuildKit cache mounts. This specialized cache allows us to do more fine-grained caching across builds. Here is an example of the RUN cache in action with a Ubuntu image:

FROM ubuntu
RUN rm -f /etc/apt/apt.conf.d/docker-clean
RUN \
    --mount=type=cache,target=/var/cache/apt \
    apt update && apt-get --no-install-recommends install -y gcc
Enter fullscreen mode Exit fullscreen mode

Reduce the total number of layers

The more layers we have in our image, the more layers we have to rebuild when the cache is invalidated, and the more opportunities for the cache to be invalidated. Below are some handy tips for reducing the number of layers in our image.

Combine multiple RUN commands where possible

The number one Dockerfile lint issue we've detected in Depot is multiple consecutive run instructions. The more we combine RUN commands, the fewer layers we will have in our image. For example, if we had a Dockerfile like this:

RUN some-command-that-creates-big-file
RUN some-command-that-removes-big-file
Enter fullscreen mode Exit fullscreen mode

This creates an unnecessary layer in our image, the layer that initially downloaded the big file. We can combine these two commands into a single RUN command:

RUN some-command-that-creates-big-file && \
    some-command-that-removes-big-file
Enter fullscreen mode Exit fullscreen mode

This downloads the big file and removes it all within a single layer, saving us an intermediate layer with the large file present.

Be thoughtful about base images

The base image we use can significantly impact the number of layers in our image. Choosing an image closely related to the application or service we are containerizing can avoid recreating unnecessary layers. It can also help us stay updated with security patches and other updates that a particular framework or tool may have.

It's also worth considering using smaller base images to improve build performance and reduce final image size. For instance, if we are building a Node.js application, we may be able to use the node:alpine image instead of the node image. This can reduce the number of layers and final image size in our image.

Take advantage of multi-stage builds

A multi-stage build allows us to have multiple FROM instructions in our Dockerfile. This can be useful for reducing the number of layers in our final image. For instance, if we are building a Node.js application, we may have a Dockerfile like this:

FROM node:20-alpine AS build

WORKDIR /app
COPY package.json yarn.lock tsconfig.json ./
RUN yarn install --immutable
COPY src/ ./src/
RUN yarn build

FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/node_modules /app/node_modules
COPY --from=build /app/dist /app/dist
CMD ["node", "./dist/index.js"]
Enter fullscreen mode Exit fullscreen mode

Conclusion

The Docker build cache, when leveraged correctly, can significantly speed up our Docker image builds. By being mindful of how we order our layers, what we copy into our image, and how we structure our Dockerfiles, we can make our builds faster and more efficient.

Using the Docker build cache efficiently can speed up your internal development, CI/CD pipelines, and deployments. With Depot, we had another speed boost to this problem by persisting your cache automatically across a distributed storage cluster that your entire team and CI workflows can share. With even faster caching and native Intel & Arm CPUs for zero-emulation builds, we've seen Depot folks getting 30x faster Docker image builds.

If you want to learn more about how Depot can help you optimize your Docker image builds, sign up for our free trial.

Top comments (6)

Collapse
 
stanleysathler profile image
Stanley Sathler • Edited

Thanks, Kyle! Re. having 2 consecutive layers, one downloading the file, one removing it - will the final image have that intermediary layer in it (with the file), or will that have no file at all as a result of the last layer only? In other words, will the final image have intermediary layer contents or does it only care about the final layer results?

Collapse
 
kylegalbraith profile image
Kyle Galbraith

Great question! A Docker image is really a series of layers all stacked on top of one another. There is an exception there with multi-stage builds, but I'll come back to that.

In the example you're referring to, yes there will be an intermediate layer in the image that contains that file. Now if you SSH into the image, will you find the file? No, as the last layer is your entry layer. But that intermediate layer does exist in the image as a whole and thus contributed to the overall size.

Multi-stage builds, i.e. using multiple FROM statements, is like building multiple images concurrently and then in your last stage you copy in files from the other images and that becomes your entire final image. The earlier stages, the other FROM steps, are thrown away and don't contribute to the final image or its size.

Collapse
 
lovestaco profile image
Athreya aka Maneshwar

Multi-stage is the best I reduced soo much junk and got a concise image in the end xD

Collapse
 
hseritt profile image
Harlin Seritt

Hi Kyle, thanks for the write-up. We've gone to doing something like this at work because our images for our angular app has gotten way too slow to build.

Collapse
 
kylegalbraith profile image
Kyle Galbraith

I'd love to learn more about that!

Collapse
 
nxquan profile image
nxquan

good