At Kelda we're building Blimp, a version of Docker Compose that runs in the cloud. Our goal is to improve the development productivity by providing developers with an alternative to bogging down their local systems with loads of resource-hungry Docker containers.
We've put a lot of engineering effort into supporting all of the Docker Compose fields commonly used during local development, such as
build. In this post I'll talk a bit about what we've gleaned from the experience as it relates to Docker Compose's
services: service: build: .
When a service has a
build field, Blimp builds your images locally, and pushes them to the cloud so that they can be pulled by the development environment. This push can be frustratingly slow, especially on home networks. Waiting 30 minutes for the image to upload before being able to start developing was just unacceptable to us.
To be fair, Docker already has some image optimizations built in, but it didn't do exactly what we wanted out of the box. So, we set out to optimize the push process. To achieve this, we had to dive deep into Docker's image push API.
In this post, I'll cover:
- What exactly happens when you do a
- How we used this to build our pre push feature and decrease image push times by 90%.
Before diving into the image push API, you first need to understand what a Docker image actually is.
It's common for developers to think of Docker images like operating system images or ISOs -- a static snapshot of a filesystem that represents the container. Really though, Docker images are quite a bit more sophisticated than that.
A Docker image is made up of layers of filesystems. Put simply, each line in a Dockerfile can be thought of as a layer, and the sum of all the layers the Dockerfile defines is the resulting image.
For example, in the following,
FROM python is telling Docker to lay the foundation of our image with the existing Python layers. Likewise,
COPY . . creates a new layer which contains all the files in
. (i.e. the current working directory, which is referred to as the build context), and overlays them on top of any existing layers.
FROM python COPY . . CMD python app.py
The Python base image is 934MB. Assuming that the user is copying in 2MB of files, the base image would make up 99% of the resulting image!
This provided us with a really interesting opportunity to optimize. Why should we waste a user's precious bandwidth pushing this entire image, when often times the vast majority of it is already available from public sources?
Our solution is to have users only push the bits of the image that are unique to their build, and then automatically fetch the rest directly from the base image's registry (e.g. DockerHub), which has plenty of bandwidth.
Bringing it back to the Python example above, we want to make it so that the
python layers aren't uploaded over the user's network. Instead, our servers will "pre push" the layers from our high bandwidth servers. Then, the user's
docker push just needs to push the layer for
COPY . ..
The good news for us was that out of the box, Docker only pushes the layers that don't already exist in the registry. Each layer has a
digest, which represents the contents of the layer. These digest IDs are used before pushing to figure out if the registry already has that layer -- if it does, then the client doesn't bother pushing the layer's contents.
But we still had to design a way to prepopulate the base image layers in the registry so that the Docker Push API would reuse them.
Docker pushes images in two parts: first it uploads the layers described above. Then, once all the layers are uploaded, it uploads the signed manifest, which references the layers and has some additional metadata.
Each layer upload starts off with a
HEAD request that checks whether the layer already exists in the registry.
If the layer already exists in the registry, then the registry responds with a
200 OK response, and the Docker client doesn't bother pushing it again. In these situations,
docker push shows the following output:
6b73f8ddd865: Layer already exists
If the layer doesn't exist, then the registry responds with
202 Accepted, along with the URL that should be used for uploading the layer. The client then uploads the image in chunks via
PATCH requests, or directly via a single
This layer checking only works when the layers in question exist in the same repository as the image being pushed. So
blimp/backend:2 can share layers, but
blimp/backend:1 can't share layers with
blimp/another-image:1 (without taking advantage of another API, that I'll describe now).
You may have seen the following output when running
docker push before. This output means that the push is making use of cross repository mounts, which is a cool feature to cache layers across multiple images.
e1c75a5e0bfa: Mounted from library/ubuntu
This feature was introduced in Docker Registry v2.3.0. Cross repository mounts allow clients to inform the registry that they know about another image in the registry that may share the same layer, and that the registry should try using the layer from that image rather than going through the full upload process.
When Docker receives this request, it first makes sure that the client has pull access to this other repository. If the client has access, and the layers match up, the registry sends back a
201 Created response. Otherwise, it sends a
202 Accepted response, and the client goes through the full upload process described above.
If you use a custom Docker image for development, Blimp will automatically build and push the image when you start up your sandbox. The image for each service is pushed to
blimp-registry.kelda.io/<sandboxID>/<service>:<imageID>, where is a unique identifier for your sandbox, and is a hash to make sure we always run the latest version of your image.
As a reminder, our goal for looking into all this is to make it so that when you push this image, you only have to push the "unique" layers that can't be pulled from more efficient sources.
At first, we wanted to make use of cross repository mounts. This would let all our users share the same base images, so we would only have to push the base image for the very first user that references it. Plus, it'd set us up to build private image caches for teams so that they could share layers from their Dockerfile other than the base image.
We were hoping to do something like this:
- Analyze the image's Dockerfile to find out what its base image is.
- Send a request to our server to push this base image to the registry with the name
- Tag the base image locally with
blimp-registry.kelda.io/public/<image>:<tag>so that Docker would provide it as a cross repository mount.
- Push the image with
Unfortunately, step 3 didn't actually cause Docker to provide the pre pushed base image as a cross repository mount. Docker only updates its list of images used for cross repository mounts on the first time a layer is pushed or pulled.
We considered giving users push access to the public repo, but we deemed that too insecure. We also considered ditching
docker push entirely in favor of go-containerregistry, but that would have entailed making a significant change to
go-containerregistry in order to show image push updates.
So, we went back to the drawing board.
After giving up on cross repository mounts, we asked: why bother with cross repository mounts when we could just push directly to the user's repository?
Although our servers would have to push a copy of the base image for each user, this is still much more efficient than having the user push it directly from their laptop since the bandwidth between our servers and the registry is so much higher.
Ultimately, that's what we settled on. The repository for each service (
blimp-registry.kelda.io/<sandboxID>/<service>) always has a
base tag that our servers push the base image to. The registry then automatically references it during the normal push API outlined above whenever the user pushes their image -- no icky manipulation of Docker's state necessary.
Putting it all together, this is what happens when Blimp pushes a locally built image:
- The Blimp CLI parses the reference to the base image from the image's Dockerfile.
- The Blimp CLI tells the Blimp servers to push the base image to
- The Blimp CLI builds the image, using the same base image.
- The Blimp CLI pushes the full image to
- Docker goes through the layers one by one, and pushes them. If the layer is from the base image, the registry notices and instructs the CLI to skip the push.
- For the layers not in the base image, Docker does the full upload process.
At Blimp, we want to make moving your development environment to the cloud as seamless as possible. One of our design principles is that the move should use the exact same config, and not require any changes to your workflow. Although we could have users work around the push slowness by prebuilding and pushing images to a shared public repository, that would violate our design goals. Building this feature was a fun deep dive into Docker internals, and a big step towards making the onboarding process to Blimp seamless.
See how fast it is yourself! Try an example
Read more about Docker internals -- see how registry credentials are stored.
By: Christopher Cooper