Kyle Galbraith

Posted on Sep 15, 2023 • Originally published at depot.dev

Top 10 common Dockerfile linting issues

#docker #devops

We recently announced the ability to lint Dockerfiles on build in our recent lint & build blog post.

Running a Dockerfile linter on a Docker image we want to build can allow us to follow some of the best practices around writing efficient Docker images. Efficient could mean faster builds or smaller image sizes.

This post covers the ten most common Dockerfile linting issues we've seen flowing through Depot to date. We expect these to change over time, but hopefully they can give everyone a good starting point for improving their Dockerfiles. We'll cover each issue, why it's a problem, and how to fix it.

How to lint a Dockerfile

With Depot, we make use of two Dockerfile linters, hadolint and a set of Dockerfile linter rules that Semgrep has written to make a bit of a smarter Dockerfile linter.

To lint a Dockerfile on-demand with Depot, we can pass the --lint flag during a build, which will run before the build.

Of course, we can also run hadolint ourselves locally without Depot with our own specific rules and config file. Or even use the hadolint Dockerfile linter UI. To run hadolint locally you can either install it via brew or use the Docker image and pipe your Dockerfile into it:

hadolint Dockerfile
# or use the Docker image
docker run --rm -i ghcr.io/hadolint/hadolint < Dockerfile

1. Multiple consecutive `RUN` instructions

Also known as lint error DL3059 from hadolint.

This is the most common issue we see with Dockerfiles flowing through Depot. It's present in nearly 30% of all Dockerfiles we've seen. The problem is that multiple RUN instructions are in a row that could be condensed. For example:

RUN download_a_really_big_file
RUN remove_the_really_big_file

It's helpful to know how Docker layer caching works to understand why this might be problematic. In short, each new RUN statement in a Dockerfile results in a new layer in the final image.

In this example, we create a new layer when we download the big file and another layer when we remove it. Both layers will be present in the final image. So, the final image will contain the big file in the first layer, making the final image larger than it needs to be.

However, DL3059 can also be problematic if we use two different RUN statements to install packages. For example:

RUN fetch_package_registry_list
RUN install_some_package

The first RUN statement will fetch the package registry list in this example. The second RUN statement will install the package. But if the package registry list changes between the first and second RUN statements, then the package registry list will be out of date when we install the package.

Solution to `DL3059`

When working with large files that we add and remove during a docker build, combining those operations into one atomic RUN statement is helpful.

RUN download_a_really_big_file && \
    remove_the_really_big_file

This reduces the final image size by removing the intermediate layer that contains the big file as we download and remove it in the same RUN statement. Note that this can have cache implications if you combine RUN statements with things that can be cached with things that frequently invalidate the cache. In those situations, you likely want to keep the portion that can be cached in its own RUN statement.

For the package registry example, we want to combine the fetch registry list with the install package into one RUN statement.

RUN fetch_package_registry_list && \
    install_some_package

This ensures that the package registry list is updated when we install the package instead of potentially being outdated.

2. Pin versions during `apt-get install`

A more controversial Dockfile linting issue is DL3008 from hadolint. This issue is also present in 30% of all Dockerfiles. The problem arises when not pinning versions during apt-get install. For example:

FROM ubuntu:22.04
RUN apt-get update && \
    apt-get install -y some-package

When you don't version pin, you're not forcing the docker build to verify it has a specific version and thus the required packages you may need. This can lead to unexpected behavior when you build your Dockerfile or run the resulting image if you inadvertently installed a newer version of a package than you expected.

Solution to `DL3008`

FROM ubuntu:22.04
RUN apt-get update && \
    apt-get install -y some-package=1.2.*

By pinning the version of some-package, the build is forced to retrieve the particular version. This allows you to build up guarantees about the packages you're installing in your Dockerfile and the dependencies of those packages.

The reason it's controversial is because version pinning runs the risk of needing to catch up on security updates. For example, suppose you pin a package version with a security vulnerability. In that case, you risk not getting your security update when you build your Dockerfile until you change the version to a new one. This is why it's essential to understand the packages you're installing and the security implications of pinning versions.

3. Use `--no-install-recommends` to avoid installing unnecessary packages

Another widespread linter error is DL3015, installing unnecessary packages with apt-get. This is present in 22% of all Dockerfiles. The issue arises when we're not using the --no-install-recommends flag during apt-get install. For example:

FROM ubuntu:22.04
RUN apt-get update && \
    apt-get install -y some-package

When you don't use the --no-install-recommends flag, you install all the recommended packages for the package and the package itself. Potentially increasing the final size of your Docker image by installing packages you don't need.

Soltuion to `DL3015`

FROM ubuntu:22.04
RUN apt-get update && \
    apt-get install -y some-package --no-install-recommends

The solution is to pass the flag --no-install-recommends to apt-get install. This will prevent the installation of recommended packages and reduce the final size of your container image. It's essential to understand the recommended packages for the packages you're installing to ensure you're getting all the dependencies.

4. Avoid using the cache directory when using `pip install`

Docker layer caching comes in again when we're talking about pip install during a Docker build. Hadolint error DL3042 is present in 18% of all Dockerfiles. The issue arises when we're not telling pip install not to use a cache directory in our Dockerfile. For example:

FROM python:3.11
RUN pip3 install mysql-connector-python

When you don't tell pip install not to use a cache directory, it will install the package and keep a cache directory for that package, which creates an unnecessary cache entry for every package you've installed via pip in that layer. When you have lots of packages, this can increase your final Docker image size.

Solution to `DL3042`

FROM python:3.11
RUN pip3 install --no-cache-dir mysql-connector-python

We don't need a cache directory for our pip packages because we don't need to reinstall packages when building a Docker image. The Docker layer cache can be used instead. Turning off the cache directory makes our final image smaller.

5. Remove the `apt-get` lists after installing packages

As we explored in our post around reducing Docker image sizes, keeping container image sizes down often returns to the actual docker build process. Hadolint error DL3009 is present in 16% of all Dockerfiles. The issue arises when we're not removing the apt-get lists after installing packages. For example:

FROM ubuntu:22.04
RUN apt-get update && \
    apt-get install -y some-package --no-install-recommends

Our earlier example for DL3015, shown here, can be optimized further to keep the final image size down. By not cleaning up the apt-get cache, it's written into the layer for that RUN statement. We are taking up valuable space in our final image.

Solution to `DL3009`

FROM ubuntu:22.04
RUN apt-get update && \
    apt-get install -y some-package --no-install-recommends && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Here, we are combining the installation of some-package with the clean-up of the apt-get cache so that installing and clean-up happen in one atomic RUN statement. This keeps the final image size down by removing the apt-get cache from the final image and doesn't introduce another layer into the final image.

6. Make use of `WORKDIR` instead of `RUN cd some-path`

Another common Dockerfile linter issue is DL3003, using RUN cd instead of the WORKDIR statement. This is present in 14% of all Dockerfiles. Here is a typical example:

FROM ubuntu:22.04
RUN cd /usr/src/app && git clone git@github.com:depot/some-repo.git

Each RUN statement executes inside its own shell, and most commands can work with absolute paths.

Solution to `DL3003`

FROM ubuntu:22.04
WORKDIR /usr/src/app
RUN git clone git@github.com:depot/some-repo.git

When changing directories, you can use the WORKDIR statement, which spawns the shell in your specified directory. The only exception is when you need to do something inside the subshell; in that scenario, you still need to use cd.

7. Pin versions when installing packages via `pip`

Like DL3008, the Dockerfile linter issue DL3013 is the same idea but applied to pip install instead of apt-get install. This is present in 13% of all Dockerfiles. Here is a typical example:

FROM python:3.11
RUN pip3 install --no-cache-dir mysql-connector-python

When you don't version pin, you're not forcing the docker build to verify it has a specific version and thus the required packages you may need. As we saw in DL3008, this can have unexpected behavior if we install a different version than what we originally installed when we created the Dockerfile.

Solution to `DL3013`

FROM python:3.11
RUN pip3 install --no-cache-dir mysql-connector-python==8.1.0

By pinning the version of mysql-connector-python, the docker build is forced to retrieve the particular version regardless of what may be in the Docker layer cache.

8. Use JSON notation for `CMD` and `ENTRYPOINT` arguments

This Dockerfile lint error, DL3025, comes down to correctness when running the image. It's present in 12% of all Dockerfiles. Here are typical examples for both statements where this comes up:

FROM ubuntu:22.04
ENTRYPOINT foo run-server

FROM ubuntu:22.04
CMD foo run-server

When we don't use JSON notation for CMD and ENTRYPOINT arguments, the executables referenced won't receive signals from the OS correctly. This is particularly relevant when talking about how to signal to a running container that it is being shut down (i.e., a SIGTERM).

Solution to `DL3025`

FROM ubuntu:22.04
ENTRYPOINT ["foo", "run-server"]

FROM ubuntu:22.04
CMD ["foo", "run-server"]

By using JSON notation, the executable will be the containers PID 1 and, therefore, receive signals from the OS. Two additional things to note about this notation:

CMD doesn't process environment variables in shell form (i.e., $FOO_BAR) because of the side effect of how sh -c is used as the default entry point. So, we must handle environment variables ourselves outside the CMD statement.
The CMD statement is parsed as a JSON array, so we must use double quotes ("") instead of single quotes('') to correctly pass our arguments.

9. Use `apt-get` or `apt-cache` instead of the user facing `apt`

The command, apt, is meant to be an end-user tool and not to be used in Dockerfile RUN statements. So, DL3027 flags this Dockerfile lint error when we use apt instead of apt-get or apt-cache. This is present in 9% of all Dockerfiles. Here is a typical example:

FROM ubuntu:22.04
RUN apt install -y some-package=1.2.*

Solution to `DL3027`

FROM ubuntu:22.04
RUN apt-get install -y some-package=1.2.*

The interface of apt is not guaranteed across versions by Linux distributions. So it's better to use apt-get or apt-cache, which are more stable.

10. Pin versions when installing packages via `apk add`

As we've seen in DL3008 and DL3013, pinning versions is also important for apk add in Alpine-based Dockerfiles. This is present in 8% of all Dockerfiles. Here is a typical example:

FROM alpine:3.7
RUN apk --no-cache add some-package

Solution to `DL3018`

FROM alpine:3.7
RUN apk --no-cache add some-package=~1.2.3

The rationale is the same: version pinning forces the docker build to fetch the pinned version regardless of what may be in the Docker layer cache. An important thing to note for Alpine-based images is that we are using partial pinning here via the ~ syntax. We can pin to a specific version via some-package=1.2.3, but this will fail the build if this package is removed.

Conclusion

In this post, we looked at the top 10 most common Dockerfile linting issues we're seeing as builds are flowing through Depot. As we saw, they can vary in severity and impact. But they all have the potential to improve your Dockerfiles and your builds. Each issue comes with its own set of pros and cons.

For example, pinning versions can guarantee a specific state when building Docker images but have the downside of potentially missing security updates. Or using --no-install-recommends can avoid making your image bigger for dependencies you don't need or use. But it can also mean you miss a dependency that you need.

This post has given you some ideas on improving your Dockerfiles and your builds via linting. If you want to learn more about how Depot can help you improve your Dockerfiles on-demand, check out our recent post on linting and building Dockerfiles.

If you're looking to make your Docker image build process faster either for native Intel or Arm images, sign up for an account and give things a try. We make it easy to run your first build with either docker build or depot build via our quickstart guide.