Sherine Khoury

Posted on Nov 23, 2023

A gopher’s journey to the center of container images

#containers #go #docker #tutorial

Blissful past...

A couple of years ago, I would never have thought that I would get that interested in the underlying structure of containers, not to mention going into the journey of building one in Golang.

I was living the blissful life of an engineer who simply uses podman pull or docker push, creates ContainerFile or Dockerfile, then runs command lines to build images from such files... sitting back and watching the standard output list the layers being built, then pushed one by one with nice digests to the registry of my choosing, under the tag of my choosing.

All changed when...

All this changed when I started contributing to oc-mirror, around a year ago. oc-mirror is a plugin of OpenShift's CLI, and targets disconnected clusters. It mirrors all images needed by such clusters in order to install and upgrade OpenShift as well as all its Day-2 operators from operator catalogs.
Suddenly, the underground world of containers unraveled.

Most of the logic of oc-mirror is about extracting metadata from images such as release images and operator catalog images, interpreting the contents of these images in order to determine the list of images that constitute a release or an operator, and later copy those images to an archive or to a partially disconnected registry.

Nevertheless, some of the activities also include building multi-arch images. This is the case for the graph image. Without going into the details of what this image is useful for, let's just say that the graph image is simply a UBI9 image, to which we copy some metadata in /var/lib, and whose CMD we modify, so that this image can become an init container for the disconnected cluster to use.

Let's start building

In this article, we are roughly trying to build in Golang the equivalent of this very simple Containerfile:

FROM registry.access.redhat.com/ubi9/ubi:latest

RUN curl -L -o cincinnati-graph-data.tar.gz https://api.openshift.com/api/upgrades_info/graph-data

RUN mkdir -p /var/lib/cincinnati-graph-data && tar xvzf cincinnati-graph-data.tar.gz -C /var/lib/cincinnati-graph-data/ --no-overwrite-dir --no-same-owner

CMD ["/bin/bash", "-c" ,"exec cp -rp /var/lib/cincinnati-graph-data/* /var/lib/cincinnati/graph-data"]

I'll try to describe the three paths I explored to achieve this task. I'm aware these are probably not the only possibilities, and probably not always adapted to what your context is:

Context A - having root capabilities: using containers/buildah
Context B - no root capabilities:
- Using containers/buildah in a secure namespace
- Using google/go-container-registry

Context A - having root capabilities: Using `containers/buildah`

For the task of building the graph image, my first idea was to rely on buildah.
In fact, our design was already heavily relying on containers/image for all things regarding copying images from one registry to the other, or from one registry to an archive. The obvious choice was to use the same suite of modules in order to keep dependencies to a minimum.

My implementation effort was greatly guided by Buildah's tutorial 4-Include in your build tool.

I'm assuming here that the golang binary that I'm building can have root privileges. If this is not your context, and you'd like to run this binary as non-root, you will need a special setup of the builder (which you can find in the next section).

With the assumption that root privileges are available, the implementation is fairly simple. As you'll see below, each instruction of the Containerfile has an equivalent method in the builder interface.

I encountered one small gotcha: Any files or folders that you want to copy/add to the image need to be in the current working directory.

For our development, this was a little incovenience: why would someone using the tool in his home directory suddenly end up with Openshift's upgrade graph metadata poluting his home?! But this could easily be worked around by cleaning up in a defer statement when the builder was done (regardless of the build outcome: success or failure).

All the code is available here.

Now let's break down what needs to be done:

Initializing the builder - FROM instruction

I want to initialize the builder on ubi9 image. This is passed in the BuilderOptions like this:

const(
    graphBaseImage              string = "registry.access.redhat.com/ubi9/ubi:latest"
)
// ... truncated code
builderOpts := buildah.BuilderOptions{
  FromImage:    graphBaseImage,
  Capabilities: capabilitiesForRoot,
  Logger:       logger,
}
builder, err := buildah.NewBuilder(context.TODO(), buildStore, builderOpts)

Adding a layer - ADD instruction

Given that I have prepared the files that need to be copied to the image in graphDataUntarFolder, I can add the content of the whole folder using builder.Add. The AddAndCopyOptions can help set the userID and groupID owning these files and folders inside the container.

    addOptions := buildah.AddAndCopyOptions{Chown: "0:0", PreserveOwnership: false}
    addErr := builder.Add(graphDataDir, false, addOptions, graphDataUntarFolder)

Updating the command - CMD instruction

Next, we want to setup the command of the container image. This is rather straightforward:

    builder.SetCmd([]string{"/bin/bash", "-c", fmt.Sprintf("exec cp -rp %s/* %s", graphDataDir, graphDataMountPath)})

Building and pushing

It's now time to build the image and push it. By default, you can push to the store by first preparing the image reference like so:

imageRef, err := is.Transport.ParseStoreReference(buildStore, "docker.io/myusername/my-image")

But in my case, I opted for pushing it directly to the destination registry, like so:

    imageRef, err := alltransports.ParseImageName("docker://localhost:7000/" + graphImageName)
  // ... truncated code
    imageId, _, _, err := builder.Commit(context.TODO(), imageRef, buildah.CommitOptions{})

Context B - using `buildah` as non root

oc-mirror being a CLI plugin, it should not require any extra root permissions in order to build images.

Buildah provides a way to run as non-root. But before we delve into that, a small parenthesis on the configuration of the store that Buildah uses:

Store defaults

Buildah relies on a build store for keeping track of layers, images pulled, built, etc. For setting up the build store, I simply used all the default setups available in the buildah module, like so:

    logger := logrus.New()
    logger.Level = logrus.DebugLevel
    buildStoreOptions, err := storage.DefaultStoreOptionsAutoDetectUID()
  // ... truncated code
    conf, err := config.Default()
  // ... truncated code
    capabilitiesForRoot, err := conf.Capabilities("root", nil, nil)
  // ... truncated code
    buildStore, err := storage.GetStore(buildStoreOptions)
    // ... truncated code
    defer buildStore.Shutdown(false)
    builderOpts := buildah.BuilderOptions{
        FromImage:    graphBaseImage,
        Capabilities: capabilitiesForRoot,
        Logger:       logger,
    }
    builder, err := buildah.NewBuilder(context.TODO(), buildStore, builderOpts)
  // ... truncated code

Setup for non-root execution

In order to integrate the buildah module to your golang product without root privileges, buildah's recommendation is to pause the execution of the go binary, create a user namespace where it could be root, and re-execute the binary in that user namespace.

This is achieved by adding the following lines in main.go, as early as you can in the main function:

if buildah.InitReexec() {
  return
}
unshare.MaybeReexecUsingUserNamespace(false)

This has to be added in the main function: you have to keep in mind that the execution will restart from the beginning, so any initializations will be done a second time.

Impacts on debugging

Re-executing has a few impacts on the way we debug our code:
This modifies the debugging process: In order to debug, I had to launch dlv debugger in a user namespace:

podman unshare dlv debug --headless --listen=:43987 main.go

PS: if you need to pass arguments to main, you can add -- to the command above, then append any arguments you have.

Once the command above is triggered, it is possible to use delve to debug (either using dlv directly or attaching to it with a client).

If you use VSCode, it is possible to attach it to the dlv process running in the background. This is achieved by adding the following code to the configurations[] inside of the launch.json:

{
    "name": "Attach Package",
    "type": "go",
    "debugAdapter": "dlv-dap",
    "request": "attach",
    "mode": "remote",
    "host": "localhost",
    "port": 43987,
},
{
    "name": "Attach Tests",
    "type": "go",
    "debugAdapter": "dlv-dap",
    "request": "attach",
    "mode": "remote",
    "host": "localhost",
    "port": 43987,
}

Impacts on users

Finally, for the use cases where our binary must run in a container, or in a pod on a Kubernetes cluster, it is important to setup securityContext and to list all the capabilities necessary to be able to run the binary inside the container. Among these capabilities, you need to include CAP_SETGID and CAP_SETUID. Other capabilities might as well be needed.

Full code

graph-data-image-builder

Context B - Using `go-containerregistry` as non-root

I also explored another module, go-containerregistry, in order to build images without root privileges. The approach is completely different, and we can manipulate each component of the container image separately. This can present an advantage, if you're looking for a way to fine tune things.

Preparing for use of `go-container-registry`

In order to start using the remote package of go-container-registry to pull/push images, you need to set :

nameOptions: StrictValidation vs WeakValidation, and the possibility for default registries to be used while referring to container images
remoteOptions: which group all configurations related to pulling and pushing images, such as:
- connection proxies, timeouts, keepAlives, use of http2 or http1.1
- configuration files containing credentials for registries
- TLS verification explicit disabling (if needed)

    nameOptions := []name.Option{
        name.StrictValidation,
    }
    remoteOptions := []remote.Option{
        remote.WithAuthFromKeychain(authn.DefaultKeychain), // this will try to find .docker/config first, $XDG_RUNTIME_DIR/containers/auth.json second
        remote.WithContext(context.TODO()),
        // doesn't seem possible to use registries.conf here.
    }

Pulling the origin image

Each image we want to build needs to be copied to a folder of your choosing on local disk. That folder (layoutDir) will contain the image layout, with any manifest-list, oci index, manifest, config, and layers...

This is achieved by using remote and layout like so:

  imgRef := "registry.access.redhat.com/ubi9/ubi:latest"
    ref, err := name.ParseReference(imgRef, b.NameOpts...)
    if err != nil {
        return "nil", err
    }
    idx, err := remote.Index(ref, b.RemoteOpts...)
    if err != nil {
        return "", err
    }
    layoutPath:= layout.Write(layoutDir, idx)
    return layoutPath

Creating a layer

Adding a layer from a tar can be achieved very easily using tarball.

Given that outputFile is a string containing the path to a tar file, LayerFromFile automatically untars the tar file contents and constructs a layer from that.

outputFile could be anywhere on the filesystem. There are no restrictions to it being saved to the working directory like in buildah.

  layerOptions := []tarball.LayerOption{}
  layer, err := tarball.LayerFromFile(outputFile, layerOptions...)
  if err != nil {
    return nil, err
  }

Updating the command

For changing anything inside an image, mutate is needed.

This is slightly more complicated than what this snippet shows, due to the fact that an image might be a dockerv2-2 manifest list or oci index, itself containing several manifests (image for a specific architecture and OS).

In order to modify the command for the multi-arch image, we'd need to update the config of each of the underlying manifests.

But let's keep that out for now, and focus on how to modify the command for a single manifest. The full code is here.

// layoutPath is the result of layout.Write from the previous snippet
idx, err := layoutPath.ImageIndex()
if err != nil {
    return err
}
idxManifest, err := idx.IndexManifest()
if err != nil {
    return err
}
manifest := idxManifest.Manifests[0]
currentHash := *manifest.Digest.DeepCopy()
img, err := idx.Image(currentHash)
cfg, err := img.ConfigFile()
if err != nil {
  return nil, err
}
cfg.Config.Cmd = cmd
img, err = mutate.Config(img, cfg.Config)
if err != nil {
  return nil, err
}

Building and pushing

Adding the layer

Same as for the modification of the command, adding a layer is achieved with mutate.

// `img` is the single arch image from the index. We get it by calling `idx.Image(currentHash)` like in the previous snippet
// `layer` is the 
additions := make([]mutate.Addendum, 0, len(layers))
for _, layer := range layers {
  additions = append(additions, mutate.Addendum{Layer: layer, MediaType: mt})
}
img, err = mutate.Append(img, additions...)
if err != nil {
  return nil, err
  }

Building new manifests and index

Once a layer is added, or a Config modified, the manifest of the image should be updated. To be more exact, we need to remove the old manifest from the index, and add a new one.

This is done by creating a new descriptor for the img that was updated in previous snippets

desc, err := partial.Descriptor(img)
if err != nil {
    return nil, err
}

Next, we need to update the image index, by replacing the descriptor:

add := mutate.IndexAddendum{
    Add:        img,
    Descriptor: *desc,
}
modifiedIndex := mutate.AppendManifests(mutate.RemoveManifests(idx, match.Digests(currentHash)), add)
resultIdx = modifiedIndex

Full code

Conclusion

Using buildah is much more simple: out of the box, it has support for multi-arch image building, as well as support for registries.conf, which was a requirement for our product.

Furthermore, and like shown in this blog entry, each Containerfile instruction maps to a builder method. This makes the builder very easy to use.

go-containerregistry has all the necessary interfaces and methods to manipulate all the building blocks of container images, regardless of their format (dockerv2-1, dockerv2-2 or oci). It is probably worth investigating whether another golang module builds on top of go-containerregistry and provides an experience closer to that of a builder, abstracting away all the lower level changes, and allowing for building multi-arch images easily. But that's a subject for a next blog...

DEV Community

A gopher’s journey to the center of container images

Blissful past...

All changed when...

Let's start building

Context A - having root capabilities: Using `containers/buildah`

Initializing the builder - FROM instruction

Adding a layer - ADD instruction

Updating the command - CMD instruction

Building and pushing

Context B - using `buildah` as non root

Store defaults

Setup for non-root execution

Impacts on debugging

Impacts on users

Full code

Context B - Using `go-containerregistry` as non-root

Preparing for use of `go-container-registry`

Pulling the origin image

Creating a layer

Updating the command

Building and pushing

Adding the layer

Building new manifests and index

Full code

Conclusion

Top comments (0)

Blissful past...

All changed when...

Let's start building

Context A - having root capabilities: Using containers/buildah

Initializing the builder - FROM instruction

Adding a layer - ADD instruction

Updating the command - CMD instruction

Building and pushing

Context B - using buildah as non root

Store defaults

Setup for non-root execution

Impacts on debugging

Impacts on users

Full code

Context B - Using go-containerregistry as non-root

Preparing for use of go-container-registry

Pulling the origin image

Creating a layer

Updating the command

Building and pushing

Adding the layer

Building new manifests and index

Full code

Conclusion

Context A - having root capabilities: Using `containers/buildah`

Context B - using `buildah` as non root

Context B - Using `go-containerregistry` as non-root

Preparing for use of `go-container-registry`