DEV Community

Cover image for Container-Diff: Change Management for Containers
Tomas Fernandez for Semaphore

Posted on • Originally published at semaphoreci.com

Container-Diff: Change Management for Containers

Touching a working Dockerfile can feel like playing with fire. We know that an innocent-looking change can have branching, hard-to-debug consequences. It's easy to get burned.

But change is inevitable, and while commits on Dockerfiles are easy to control, the impact of those changes on the resulting image are not. Fortunately, where there’s a need, there’s a tool.

Introducing container-diff

Available in macOS, Linux, and Windows, container-diff (like the name suggests) is diff for container images.

The project, developed by many of the same faces behind Container Structure Tests, does a lot more than just diffing: it can analyze container images, show installed packages, and reverse-engineer the commands used to generate them.

Testing containers

Container-diff has the following test modes:

  • Size: shows the total filesystem size.
  • Packages: shows a list of OS-installed packages (only for Debian-based distros), as well as those installed with pip and npm.
  • Filesystem: shows all the files in the image and their size.
  • Layer history: prints the commands that generated each of the layers in the image.

The command to analyze an image looks like this:

container-diff analyze [--type=TEST_TYPE] <IMAGE_NAME>
Enter fullscreen mode Exit fullscreen mode

The tool pulls the image from the registry and unpacks the filesystem into $HOME/.container-diff/cache. Then, the contents are scanned, and a report is printed out.

So, for instance, we can analyze a PostgreSQL image with:

$ container-diff analyze postgres:14

-----Size-----

Analysis for postgres:14:
IMAGE           DIGEST                                                        SIZE
postgres        sha256:3ee027aeb3c8bc4a5870b21 ... 6e27685ac1eab6d4ada        352.9M
Enter fullscreen mode Exit fullscreen mode

The default test is size. Change it to --type=apt to find out which OS-level packages are installed.

$ container-diff analyze --type=apt postgres:14

-----Apt-----

Packages found in postgres:14:
NAME                             VERSION                             SIZE
-adduser                         3.118                               849K
-apt                             2.2.4                               4.2M
-base-files                      11.1 deb11u1                        340K
-base-passwd                     3.5.51                              243K
-bash                            5.1-2 b3                            6.3M
-bsdutils                        1:2.36.1-8                          394K
-coreutils                       8.32-4 b1                           17.1M

...

-util-linux                      2.36.1-8                            4.5M
-xz-utils                        5.2.5-2                             612K
-zlib1g                          1:1.2.11.dfsg-2                     166K
Enter fullscreen mode Exit fullscreen mode

Similarly, you can get a list of globally-installed packages for Node and Python with --type=node and --type=pip.

$ container-diff analyze --type=pip python:3.10-bullseye

----------Pip-----

Packages found in python:3.10-bullseye:
NAME               VERSION        SIZE         INSTALLATION
-pip               21.2.4         5.1M         /usr/local/lib/python3.10/site-packages
-setuptools        57.5.0         2.4M         /usr/local/lib/python3.10/site-packages
-wheel             0.37.0         94.4K        /usr/local/lib/python3.10/site-packages
Enter fullscreen mode Exit fullscreen mode

You can see every file in the image with --type=file, along with its size.

$ container-diff analyze --type=file postgres:14

----------File-----

Analysis for postgres:14:
FILE              SIZE
/bin              5.1M
/bin/bash         1.2M
/bin/cat          42.9K

...

/var/spool        7B
/var/spool/mail   7B
/var/tmp          0
Enter fullscreen mode Exit fullscreen mode

💡 Use --order to show files ordered by size instead of alphabetically.

Finally, the history test shows the Docker layers, which roughly reflect the Dockerfile. The output of --type=history is hard to read, so we’ll format it with sed.

$ container-diff analyze --type=history postgres:14 | sed 's/  */ /g;s/;/\n\t/g'

----------History-----

Analysis for postgres:14:
-/bin/sh -c #(nop) ADD file:16dc2c6d1932194edec28d730b004fd6deca3d0f0e1a07bc5b8b6e8a1662f7af in /
-/bin/sh -c #(nop) CMD ["bash"]
-/bin/sh -c set -ex
     if ! command -v gpg > /dev/null
     then apt-get update
     apt-get install -y --no-install-recommends gnupg dirmngr
     rm -rf /var/lib/apt/lists/*
     fi
-/bin/sh -c set -eux
     groupadd -r postgres --gid=999
     useradd -r -g postgres --uid=999 --home-dir=/var/lib/postgresql --shell=/bin/bash postgres
     mkdir -p /var/lib/postgresql
     chown -R postgres:postgres /var/lib/postgresql

...
Enter fullscreen mode Exit fullscreen mode

Comparing containers

We’re only scratching the surface so far. Container-diff really shines when comparing images. The command for this is:

container-diff diff [--type=TEST_TYPE] <IMAGE1> <IMAGE2>
Enter fullscreen mode Exit fullscreen mode

Let’s see some use cases for image comparison.

Use case 1: generating a changelog

Container-diff works great for generating changelogs. And, as we'll see in the next section, the output format can be customized using a template.

We can list what changed at the OS level:

$ container-diff diff --type=size --type=apt postgres:13 postgres:14

----------Apt-----

Packages found only in postgres:13:
NAME                         VERSION                 SIZE
-postgresql-13               13.5-1.pgdg110 1        46.9M
-postgresql-client-13        13.5-1.pgdg110 1        6.3M

Packages found only in postgres:14:
NAME                         VERSION                 SIZE
-postgresql-14               14.1-1.pgdg110 1        48.9M
-postgresql-client-14        14.1-1.pgdg110 1        7.1M

Version differences: None

----------Size-----

Image size difference between postgres:13 and postgres:14:
SIZE1         SIZE2
350.2M        352.9M
Enter fullscreen mode Exit fullscreen mode

In the same vein, we can compare globally-installed Node packages:

$ container-diff diff --type=node node:16 node:17

----------Node-----

Packages found only in node:16: None

Packages found only in node:17: None

Version differences:
PACKAGE        IMAGE1 (node:16)        IMAGE2 (node:17)
-npm           8.1.0, 8M               8.1.2, 8M
Enter fullscreen mode Exit fullscreen mode

Or changes in Python packages:

$ container-diff diff --type=pip python:3.6.15-buster python:3.10-bullseye

----------Pip-----

Packages found only in python:3.6.15-buster:
NAME              VERSION        SIZE
-argparse         1.2.1          87.1K
-mercurial        4.8.2          9.5M
-wsgiref          0.1.2          98.7K

Packages found only in python:3.10-bullseye: None
Enter fullscreen mode Exit fullscreen mode

Use case 2: troubleshooting containers

Debugging a failing container is easy when we have a healthy image to use as a reference. To see all the file changes, run container-diff with --type=file:

$ container-diff diff --type=file myapp/myservice:v1 myapp/myservice:v2

----------File-----

These entries have been added to myapp/myservice:v1:

FILE                                            SIZE
/app/node_modules/fsevents                      186.2K
/app/node_modules/fsevents/LICENSE              1.1K
/app/node_modules/fsevents/README.md            2.9K


These entries have been deleted from myapp/myservice:v1:

FILE                                            SIZE
/app/.npm/_cacache/index-v5/ce/9f/58654f1       310B
/app/.npm/_cacache/index-v5/3d/b7/10f6556       309B
/app/.npm/_cacache/index-v5/7e/eb/c1538ff       308B

These entries have been changed between myapp/myservice:v1: and myapp/myservice:v2:
FILE                                                           SIZE1         SIZE2
/app/package-lock.json                         554.6K        554.6K
/app/node_modules/.package-lock.json           297.7K        298.1K
/app/node_modules/clean-css/History.md         77.5K         77.8K
Enter fullscreen mode Exit fullscreen mode

Once the problematic file is identified, you can compare the files in both containers to see what changed.

$ container-diff diff <IMAGE1> <IMAGE2> --type=file --filename=PATH/TO/FILE
Enter fullscreen mode Exit fullscreen mode

Use case 3: test-driving new containers

You can run container-diff to preview the impact of your changes in a build. For instance, to quickly try out different base images or play with the Dockerfile. You can iterate until you’re sure you've got it right.

Container-diff is not limited to images in remote repositories. You can analyze any local image by prefixing its name with daemon://.

container-diff diff --type=TEST_TYPE daemon://IMAGE_NAME:TAG daemon://IMAGE_NAME:TAG
Enter fullscreen mode Exit fullscreen mode

Imagine that you’re building a container for a Ruby app and want to try upgrading from Ruby 2.7 to 3.0. As a Ruby developer, you know what to expect from the language side, but can you say the same about the container?

To answer the question, let's compare the respective Ruby images:

$ container-diff diff --type=size --type=apt ruby:2.7.4-bullseye ruby:3.0.2-bullseye

----------Apt-----

Packages found only in ruby:2.7.4-bullseye: None

Packages found only in ruby:3.0.2-bullseye: None

Version differences: None

----------History-----

Docker history lines found only in ruby:2.7.4-bullseye:
-/bin/sh -c #(nop)  ENV RUBY_MAJOR=2.7
-/bin/sh -c #(nop)  ENV RUBY_VERSION=2.7.4
-/bin/sh -c #(nop)  ENV RUBY_DOWNLOAD_SHA256=2a80824e0ad6100826b69b9890bf55cfc4cf2b61a1e1330fccbcb30c46cef8d7


Docker history lines found only in ruby:3.0.2-bullseye:
-/bin/sh -c #(nop)  ENV RUBY_MAJOR=3.0
-/bin/sh -c #(nop)  ENV RUBY_VERSION=3.0.2
-/bin/sh -c #(nop)  ENV RUBY_DOWNLOAD_SHA256=570e7773100f625599575f363831166d91d49a1ab97d3ab6495af44774155c40

----------Size-----

Image size difference between ruby:2.7.4-bullseye and ruby:3.0.2-bullseye:
SIZE1         SIZE2
819.2M        835.8M
Enter fullscreen mode Exit fullscreen mode

Compare that with changing the OS flavor in the Node image. What happens if you want to swap out Bullseye for Bullseye Slim?

$ container-diff diff --type=size --type=apt --type=node node:17-bullseye node:17-bullseye-slim

----------Apt-----

Packages found only in node:17-bullseye:
NAME                                 VERSION                               SIZE
-autoconf                            2.69-14                               1.8M
-automake                            1:1.16.3-2                            1.8M
-autotools-dev                       20180224.1 nmu1                       157K
...

----------Node-----

Packages found only in node:17-bullseye: None

Packages found only in node:17-bullseye-slim: None

Version differences: None

----------Size-----

Image size difference between node:17-bullseye and node:17-bullseye-slim:
SIZE1         SIZE2
942.9M        230.7M
Enter fullscreen mode Exit fullscreen mode

Comparing regular Bullseye vs. Slim shows that:

  • Node stays the same.

  • Slim image is about 12 MB smaller.

  • The smaller image has a long list of missing packages.

This information will help you decide which is the best version for you. It makes sense to pick Slim in order to reduce the attack surface if you don’t need the extra packages.

Extending and customizing container-diff

When the default text output is not enough, we can write an output template. You can see the examples in the built-in template file.

The --format option lets us customize how information is printed out, giving us a way to export the data to other formats, such as CSV:

$ container-diff diff python:3.9-bullseye python:3.10-bullseye --type=pip --format='
package,{{.Image1}},{{.Image2}}
{{range .Diff.InfoDiff}}{{.Package}},{{range .Info1}}{{.Version}}{{end}},{{range .Info2}}{{.Version}}{{"\n"}}{{end}}{{end}}
'

package,python:3.9-bullseye,python:3.10-bullseye
pip,21.2.4,21.2.4
setuptools,57.5.0,57.5.0
wheel,0.37.0,0.37.0
Enter fullscreen mode Exit fullscreen mode

When custom formats are not enough, container-diff can be extended by writing your own differ. You'll need solid knowledge of Go for that, though.

Automated container testing with CI/CD

How does container-diff help us deploy safely? Well, if you’re doing continuous integration, you’re probably deploying several times a day, which means each new container is only a little bit different from the previous one.

Following that logic, we can assume that if too many things change at once, it may be a signal that further analysis is needed before deployment. Maybe some unexpected file snuck into the build and the image size doubled, or the base image was updated in the registry and unexpectedly shipped with different libraries.

Image description

We have to strike the right balance between stability and mutability. Every team will have different thresholds but, as a starting point, let's say that we’ll reject images that:

  • Grow more than 10% in size.
  • Have different OS libraries.
  • Have different globally-installed Node packages.
  • Were built from a different Dockerfile.

Gauging change rate between images

We can evaluate the changes by running container-diff with --json and processing the output. The format is:

{
    "Image1": "foo",
    "Image2": "bar",
    "DiffType": "Test_Type",
    "Diff": {
       // Differences Object
    }
}
Enter fullscreen mode Exit fullscreen mode

We can process the report with a combination of shell scripts and jq, the JSON Query CLI tool. First, run all the tests at once and save the output in a file:

$ container-diff --type=size --type=apt --type=node --type=history --json > diff.json
Enter fullscreen mode Exit fullscreen mode

Then, pipe the output to jq. You can filter the results per test by selecting DiffType. Use the following command to see the APT changes:

$ jq '.[] | select(.DiffType=="Apt")' diff.json
Enter fullscreen mode Exit fullscreen mode

You can get the total number of changed packages by appending .Diff.Packages1 + .Diff.Packages2 | length to the query.

$ jq '.[] | select(.DiffType=="Apt") | .Diff.Packages1 + .Diff.Packages2 | length' diff.json
Enter fullscreen mode Exit fullscreen mode

💡 You can try jq online at jq play.

Once we have all the jq queries ready, we can write a script that runs the differ, filters the results, and fails if the changes exceed certain thresholds.

#!/bin/bash
# Compare container and stop pipeline when changes exceed control parameters
# Parameters expected:
#   $ALLOWED_APT_CHANGES - max number of allowed APT packages changed
#   $ALLOWED_HISTORY_CHANGES - max number of Dockerfile commands changed
#   $ALLOWED_NPM_CHANGES - max number of NPM packages changed
#   $MAX_GROWTH_RATIO - percentual growth size allowed (0 is no growth, 100 is double size)

set -ex

image1=$1
image2=$2

diffile=$(mktemp XXXXXX.json)

container-diff diff \
    --type=history --type=node --type=size --type=apt --json \
    "$image1" \
    "$image2" \
    > ${diffile}

changes_apt=$(jq '.[] | select(.DiffType=="Apt") | .Diff.Packages1 + .Diff.Packages2 | length' ${diffile})

changes_history=$(jq '.[] | select(.DiffType=="History") |  .Diff.Adds + .Diff.Dels | length' ${diffile})

changes_npm=$(jq '.[] | select(.DiffType=="Node") | .Diff.Packages1 + .Diff.Packages2 | length' ${diffile})

# When sizes are equal jq returns a string "null"
size1=$(jq '.[] | select(.DiffType=="Size") | .Diff[0].Size1 ' ${diffile})
if [ "$size1" = "null" ]
then
    size_ratio=0
else
    size_ratio=$(jq '.[] | select(.DiffType=="Size") | 100 * .Diff[0].Size2 / .Diff[0].Size1 - 100 | floor' ${diffile})
fi

# Evaluate thresholds
if [ $changes_apt -gt $ALLOWED_APT_CHANGES ] \
    || [ $changes_history -gt $ALLOWED_HISTORY_CHANGES ] \
    || [ $changes_npm -gt $ALLOWED_NPM_CHANGES ] \
    || [ $size_ratio -gt $MAX_GROWTH_RATIO ]
then
    exit 1
else
    echo OK
fi
Enter fullscreen mode Exit fullscreen mode

Adding a change-control job to CI/CD

Where were we? Let's see, we have two images and a script to compare them. What we need now is a CI/CD pipeline that builds the image. Semaphore has the capabilities that we want for this task. If you’ve never used Semaphore before, I recommend checking out the getting started guide.

Open the workflow editor and add a block after the container image build step. Then, add the following commands in the job:

curl -LO https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64
sudo install container-diff-linux-amd64 /usr/local/bin/container-diff
echo "${DOCKER_PASSWORD}" | docker login -u "${DOCKER_USERNAME}" --password-stdin
checkout
chmod a+x container-diff-test.sh && ./container-diff-test.sh "${DOCKER_USERNAME}"/mycontainer:latest "${DOCKER_USERNAME}"/mycontainer:$SEMAPHORE_WORKFLOW_ID
Enter fullscreen mode Exit fullscreen mode

This job installs container-diff in the CI machine, logs in the Docker Hub registry (you'll need to activate a secret), clones the repository, and runs the comparison script. Change the parameters in container-diff-test.sh as needed. In this case, we're comparing the latest image against the one tagged with the unique id $SEMAPHORE_WORKFLOW_ID.

Image description

That’s it! You can complete the pipeline with the deployment method of your choice.

Image description

If you need inspiration for setting up a deployment, check these resources to learn how you can deploy with Semaphore:

Wrapping up

Container-diff is yet another quality tool to keep containers in check. Remember, when using containers, you’re responsible for the whole mini OS that comes with them, not just the code.

Increase your Docker-fu with these posts:

Thank you for reading!

Discussion (0)