Speed up multi-stage Docker builds in CI/CD with Buildkit’s registry cache

#docker #buildkit #cicd #cloudnative

Working on a GitOps framework around Kubernetes, I naturally run everything in containers. The two Dockerfiles that matter most for me, unfortunately, both have to download a lot of dependencies not included in the repository at build time. Which means the layer cache is crucial. Unfortunately, ephemeral CI/CD runners like GitHub Actions, start each run with an empty cache.

The first of the two Dockerfiles builds the image for the framework itself. This image is used for bootstrapping, automation runs and also disaster recovery. As such, it’s not your run-of-the-mill Dockerfile. It includes a number of dependencies installed from Debian packages, various Go binaries and last but not least the Python based CLIs of AWS, Azure and Google Cloud. It makes heavy use of multi-stage-builds and has different build stages for common dependencies and each Cloud provider’s specific dependencies. The layers of the final image also mirror the build stage logic.

Dockerfile number two is for the Kubestack website itself. The site is built using Gatsby and has to download a lot of node modules during the build. The Dockerfile is optimized for cache-ability and uses multi-stage builds to have a build environment based on NodeJS and a final image based on Nginx to serve the static build.

Build time for both, the framework image and the website image, heavily benefits from having a layer cache.

Docker has had the ability to use an image as the build cache using the --cache-from parameter for some time. This was my preferred option because I need the ability to build and push images anyway. Storing the cache alongside the image is a no-brainer in my opinion.

For the website image the first step of my CI/CD pipeline is to pull the cache image. Note the || true at the end to ensure a missing cache doesn’t prevent my build from running.

docker pull gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache || true

Step two runs a build targeting the dev stage of my multi-stage Dockerfile and tags the result as the new build-cache.

docker build \
  --cache-from gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache \
  --target dev \
  -t gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache \
  .

The next step runs the actual build that produces the final image and tags it as well.

docker build \
  --cache-from gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache \
  -t gcr.io/$PROJECT_ID/$REPO_NAME:$COMMIT_SHA \
  .

Finally, the pipeline pushes both images.

docker push gcr.io/$PROJECT_ID/$REPO_NAME:latest-build-cache
docker push gcr.io/$PROJECT_ID/$REPO_NAME:$COMMIT_SHA

For a simple multi-stage build with only two stages, like my Gatsby website’s Dockerfile, this works pretty well.

But when I tried this for a project with multiple build stages, one for Python and one for JS, specifying two images under --cache-from never seemed to work reliably. Which is double unfortunate, because having a layer cache here would save time not downloading Python and JS dependencies on every run.

Having cache pull and cache build steps for every stage also makes for a growingly verbose pipeline file the more stages you have.

So for the framework Dockerfile, I need something better.

Enter buildkit. Buildkit brings a number of improvements to container image building. The one’s that won me over are:

Running build stages concurrently.
Increasing cache-efficiency.
Handling secrets during builds.

Apart from generally increasing cache efficiency, it also allows more control over caches when building with buildctl. This is what I needed. Buildkit has three options for exporting the cache. Called inline, registry and local. Local is not particularly interesting in my case, but would allow writing the cache to a directory. Inline includes the cache in the final image and pushes cache and image to the registry layers together. But this only includes the cache for the final stage in multi-stage builds. Finally, the registry option does allow pushing all cached layers of all stages into a separate image. This is what I needed for my framework Dockerfile.

Let’s take a look at how I’m using this in my pipeline. Having the cache export and import included in buildkit means I can reduce the three steps into one. And it also stays one step, no matter how many stages my Dockerfile has.

docker run \
  --rm \
  --privileged \
  -v `pwd`/oci:/tmp/work \
  -v $HOME/.docker:/root/.docker \
  --entrypoint buildctl-daemonless.sh \
  moby/buildkit:master \
    build \
    --frontend dockerfile.v0 \
    --local context=/tmp/work \
    --local dockerfile=/tmp/work \
    --output type=image,name=kubestack/framework-dev:test-${{ github.sha }},push=true \
    --export-cache type=registry,ref=kubestack/framework-dev:buildcache,push=true \
    --import-cache type=registry,ref=kubestack/framework-dev:buildcache

This one command handles pulling and importing the cache, building the image, exporting the cache and pushing the image and the cache. By running the build inside a container, I also don’t have to worry about installing the buildkit daemon and cli. The only thing I needed to do was providing the .docker/config to the build inside the container to be able to push the image and the cache to the registry.

For a working example, take a look at the Kubestack release automation pipeline on Github.

Using the cache, the framework image builds in less than one minute. Down from about three minutes before using buildkit without the cache export and import.

Deploy and scale your apps on AWS and GCP with a world class developer experience

Coherence makes it easy to set up and maintain cloud infrastructure. Harness the extensibility, compliance and cost efficiency of the cloud.

Learn more

Top comments (5)

Alex Ianus • Sep 17 '20

Great article, really helped.

Just wanted to point out that the mode=max parameter is necessary on the --export-cache line in order for the intermediate stages to be cached correctly:

--export-cache mode=max,type=registry,ref=kubestack/framework-dev:buildcache,push=true \

Brent O'Connor • Jan 15 '21

Philipp,

My build runs for about ten minutes and then I get the following error during the export.

2021-01-15T15:39:05.4205121Z #32 exporting cache
2021-01-15T15:39:05.4206496Z #32 sha256:2700d4ef94dee473593c5c614b55b2dedcca7893909811a8f2b48291a1f581e4
2021-01-15T15:39:05.4207322Z #32 preparing build cache for export
2021-01-15T15:40:37.6531312Z #32 preparing build cache for export 92.2s done
2021-01-15T15:40:37.6532282Z #32 writing layer sha256:15ad4c058791b2553c6ac51a06d58da729216367debede67e23bfb1887a6afe9
2021-01-15T15:40:39.4543043Z #32 writing layer sha256:15ad4c058791b2553c6ac51a06d58da729216367debede67e23bfb1887a6afe9 1.8s done
...
2021-01-15T15:41:59.4811151Z #32 writing config sha256:b0fe7df00fcf5dd735567d810bf193e80433cb00659ffc1e8b09aa97f18ccd14
2021-01-15T15:42:00.2316716Z #32 writing config sha256:b0fe7df00fcf5dd735567d810bf193e80433cb00659ffc1e8b09aa97f18ccd14 0.8s done
2021-01-15T15:42:00.2318570Z #32 writing manifest sha256:d2b65e39c62a5ac4acc922292dd268a186e8286b74456287b7c0fbe457402efa
2021-01-15T15:42:00.6394612Z #32 writing manifest sha256:d2b65e39c62a5ac4acc922292dd268a186e8286b74456287b7c0fbe457402efa 0.5s done
2021-01-15T15:42:00.6397257Z #32 ERROR: error writing manifest blob: failed commit on ref "sha256:d2b65e39c62a5ac4acc922292dd268a186e8286b74456287b7c0fbe457402efa": unexpected status: 400 Bad Request
2021-01-15T15:42:00.6399515Z ------
2021-01-15T15:42:00.6401048Z  > importing cache manifest from ############.dkr.ecr.us-east-1.amazonaws.com/foo:task-cache-github-docker-build-buildcache:
2021-01-15T15:42:00.6402102Z ------
2021-01-15T15:42:00.6402665Z ------
2021-01-15T15:42:00.6403216Z  > exporting cache:
2021-01-15T15:42:00.6403855Z ------
2021-01-15T15:42:00.6405992Z error: failed to solve: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref "sha256:d2b65e39c62a5ac4acc922292dd268a186e8286b74456287b7c0fbe457402efa": unexpected status: 400 Bad Request
2021-01-15T15:42:09.9011793Z ##[error]Process completed with exit code 1.

I'm using the following for the build:

build:
  name: moby/buildkit Build w/ registry cache
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v2
    - name: Docker Login
      run: aws ecr get-login --no-include-email
    - name: Build
    run: |
      export CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
      export BRANCH_SLUG=$(echo $CURRENT_BRANCH | sed -r 's/\//-/g')
      export DOCKER_COMMIT_HASH_TAG=${DOCKER_REGISTRY}/foo:${{ github.sha }}
      docker run \
      --rm \
      --privileged \
      -v `pwd`:/tmp/work \
      -v $HOME/.docker:/root/.docker \
      --entrypoint buildctl-daemonless.sh \
      moby/buildkit:master \
      build \
      --frontend dockerfile.v0 \
      --local context=/tmp/work \
      --local dockerfile=/tmp/work \
      --output type=image,name=${DOCKER_COMMIT_HASH_TAG},push=true \
      --export-cache mode=max,type=registry,ref=${DOCKER_REGISTRY}/foo:${BRANCH_SLUG}-buildcache,push=true \
      --import-cache type=registry,ref=${DOCKER_REGISTRY}/foo:${BRANCH_SLUG}-buildcache

Any ideas on what is wrong? I should mention that the registry I'm using is AWS ECR. "############" means it was redacted.