Running Spacelift CI/CD workers in Kubernetes using DinD

#cicd #devops #docker #kubernetes

For a while now at Spacelift we’ve been hearing from users that they’d like to be able to run Spacelift worker pools in their Kubernetes clusters. As a first step towards that we decided to investigate whether we could run workers in Kubernetes using a Docker-in-Docker sidecar container. The rest of this post gives a rough overview of the architecture of Spacelift workers, and also explains how the sidecar container strategy works.

I’d just like to say thanks to Vadym Martsynovskyy for his great article about running Jenkins agents in Kubernetes. In that article he outlines the general approach taken here, and also describes some of the pros and cons of using Docker-in-Docker. It’s definitely well worth a read.

Background

Before going into more detail, it’s worth explaining what a Spacelift worker is, and giving a quick overview of our architecture. At Spacelift, we provide a hybrid-SaaS model like many other CI/CD systems. What this means in practice is that the control plane that handles scheduling runs is managed by us, but the execution of runs happens in a separate process called a worker. We provide a public worker pool, but also allow you to self-host your own workers, providing complete control over your infrastructure.

Perhaps confusingly, if you follow the instructions to create a private worker pool, you’ll notice that it involves downloading something called the “launcher”. The launcher is responsible for communicating with the Spacelift mothership (control plane), and creating and destroying worker containers using Docker for each Spacelift run:

As you can see, the launcher process needs access to a Docker daemon in order to create a container for each run.

Running in Kubernetes

We found that the launcher architecture maps very nicely to Kubernetes using a sidecar container to run the Docker daemon. For each worker in the pool, we need to create a Kubernetes Pod with two containers: one for the launcher; and another for Docker. The following shows a stripped down Deployment definition for creating a worker pool with 4 workers:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: worker-pool-1-spacelift-worker
spec:
 replicas: 4
 template:
   spec:
     containers:
       - name: launcher
         image: "public.ecr.aws/spacelift/launcher:latest"
         imagePullPolicy: Always
         env:
           - name: DOCKER_HOST
             value: tcp://localhost:2375
           - name: SPACELIFT_TOKEN
             value: "..."
           - name: SPACELIFT_POOL_PRIVATE_KEY
             value: "..."
         volumeMounts:
           - name: launcher-storage
             mountPath: /opt/spacelift
             subPath: spacelift
       - name: dind
         image: "docker:dind"
         imagePullPolicy: Always
         command: ["dockerd", "--host", "tcp://127.0.0.1:2375"]
         securityContext:
           privileged: true
         volumeMounts:
           - name: launcher-storage
             mountPath: /var/lib/docker
             subPath: docker
           - name: launcher-storage
             mountPath: /opt/spacelift
             subPath: spacelift
     volumes:
       - name: launcher-storage
         emptyDir: {}

The launcher communicates with the Docker-in-Docker sidecar container via TCP, which is configured via the DOCKER_HOST environment variable for the launcher container. This works because the containers in a Kubernetes Pod share the same IP address and port space, and can communicate with each other via localhost.

In addition, a shared volume called launcher-storage is mounted into each container. This is used by the launcher to store the workspaces for runs, along with other things like cached tool binaries (for example Terraform). The Docker sidecar needs access to the run workspaces in order to mount them into the worker containers, and it also uses that volume to store its image cache.

The last thing worth pointing out is that the dind container sets securityContext.privileged to true. This is required for Docker-in-Docker to function correctly.

Launcher Image

The Spacelift launcher is distributed as a statically linked binary, making it simple to build an image. As you can see, the Dockerfile simply copies the launcher into an Alpine container, and then sets the startup command:

FROM alpine:3.14

COPY spacelift-launcher /usr/bin/spacelift-launcher
RUN chmod 755 /usr/bin/spacelift-launcher
CMD [ "/usr/bin/spacelift-launcher" ]

This image is rebuilt and published to our public ECR repository any time the launcher binary is updated.

Helm Chart

We provide Terraform modules to make it really easy for users to deploy worker pools to AWS, Azure and GCP, and we wanted to provide a similar experience for Kubernetes. To achieve this, we created a Helm chart that makes it simple to deploy workers to Kubernetes clusters: https://github.com/spacelift-io/spacelift-workerpool-k8s.

Assuming you’ve already configured your worker pool in Spacelift and have access to the credentials for the pool, deploying this chart to your cluster simply requires two steps.

First, add the Spacelift Helm chart repository and update your local chart cache:

helm repo add spacelift https://downloads.spacelift.io/helm
helm repo update

Next, install the chart:

helm upgrade worker-pool-1 spacelift/spacelift-worker --install --set "credentials.token=<worker-pool-token>,credentials.privateKey=<worker-pool-private-key>"

Replace <worker-pool-token> and <worker-pool-private-key> with your own credentials, and make sure to base64-encode the private key.

If all goes well, you should be able to view the pods for your worker pool using kubectl get pods:

kubectl get pods
NAME                                              READY   STATUS    RESTARTS   AGE
worker-pool-1-spacelift-worker-7fcfc9f594-f94tj   2/2     Running   0          22m

You should also be able to view the workers in your pool in Spacelift:

When implementing in a production environment, you may want to use an alternative approach to providing credentials, and may want to configure a custom storage volume for the worker Pods. For more information about configuring the chart, check out the README in the chart GitHub repo.

Closing Thoughts

If you’re interested in running Spacelift workers in Kubernetes, we’d welcome any feedback about the approach, contributions to the Helm chart, and also any issues you encounter so that we can make improvements. Also, if you aren’t already using Spacelift but are interested in trying it, you can signup and start your free evaluation of Spacelift here.

(The original post was published at Spacelift)