How werf Streamlined the Transition to Kubernetes and Accelerated Our CI/CD

#werf #kubernetes #gitlab #argocd

I’m going to share a case study on how we used werf at Seznam.cz to take a load off our migration to Kubernetes. My team and I were responsible for handling user files in persistent storage as well as tracking their versions, overwrites, distribution, etc.

Our old approaches of legacy application orchestrating caused a bunch of problems, so we decided to switch to Kubernetes. However, during the migration, it became evident that we needed a tool to streamline application building, publishing, and deploying. What we ended up with was werf, a tool designed for efficient and reliable application delivery to Kubernetes environments.

Below, I will elaborate on how werf helped us solve caching problems, speed up image building, and set up smooth integration with Argo CD. I'll take you through our entire werf journey — the first steps, the bumps along the road, and all the awesome stuff this tool brought our team. So, let's dive in!

Moving on from legacy

When I first joined, our team was using Salt (for the unfamiliar, it's a lot like Ansible) as our main orchestrator of legacy apps, and it was just painfully slow. We had to build Python code into Debian packages; application deployments would occasionally fail. Secrets were kept on the local file system, and deployments had to be triggered manually from the Salt master. You even had to notify everyone in Mattermost first, just to make sure no one would overwrite your stage release.

At that point, the company was already slowly transitioning its services to Kubernetes (about a third of our services were already there). This is what a Kubernetes-based process looked like back then:

A .env directory was used for deploying to different environments, which meant a ton of copy-pasting.
We used Jinja instead of Helm. Our in-house DevOps engineers wrote a program to render Jinja templates. You could only see the final manifest during deployment because the render script was inside the DevOps container images (it also injected its own variables into the environment).
We used docker pull as a cache; pre-built images were pulled this way with the hope that some layers could be reused.
Image versions were set either by a commit-hash or from a manually edited VERSION file. The merge train monitored the file for changes.
Harbor often got "messy", and we had to clean up old images every couple of days.

Basically, our GitLab pipeline was: performing tests → docker pull and build and push → jinja-render and kubectl apply. Obviously, this is just a simplified workflow, as there were tons of other problems going on with Debian, linting, and so on (I will omit them as they are beyond the scope of this article).

Integrating werf into the project

Every week or two, we have a tech meeting where the team presents their projects, does demos, and discusses new surfacing technologies. I got actively involved in those events: typically browsing the CNCF Landscape, picking some interesting tool or technology, and trying to use it while learning something new and compiling a demo in the process. On one such occasion, I showed them what werf can do.

First try

The demo went well, though unfortunately, I failed to get approved to push to production using werf. On the other hand, they did let me test it out in staging. So, I had to figure out how to fit werf into our workflow, which also meant tweaking it for our project structure. In the end, both staging environments for the new service were deployed using werf converge since it was easy and meant we could deploy right from our local PCs or laptops.

The build process itself was a breeze. I especially liked the fact that builds were skipped if only minimal code changes occurred.

The configuration was super simple — just one or two environment variables listed in werf-giterminism.yaml. Everything else was basic (the KISS principle works like a charm). We didn't switch to Stapel because we were already used to Dockerfile and were happy with it.

Later, during a retrospective, I learned that the team had fallen in love with werf, and we decided to continue using it and integrate it into our other projects, speeding up builds as we went. For instance, for our monolith (a container image consisting of 2–3 layers), build times dropped from 10 minutes to 3–5 minutes, depending on which layer was modified.

Sure, we hit a few walls while working on getting werf up and running, but we managed to sort them out using the documentation. The problems mostly had to do with our certificates, so we didn't even need to ask the werf developers for help.

Second try

In our search for excellence, we did not limit ourselves to werf alone, but took to using Argo CD as well. We decided to combine these tools, and our current setup is now as follows:

werf builds a bundle and pushes it to Harbor using a tag that corresponds to the pipeline ID (CI_PIPELINE_ID).
Argo CD references Harbor and tracks bundle tags (rather than Git, although we experimented with that approach as well).
A dedicated Job is responsible for synchronizing the AppProject with the CI_PIPELINE_ID.
In the merge train phase, the job submits a merge request with manifest updates, which contains a reference to the final CI_PIPELINE_ID, to the repository Argo CD monitors.

Third try

In our third iteration, we moved all CI jobs over to GitLab components and set up werf cleanup to run periodically (this worked wonders in freeing up space in our Docker registry). Yes, we had cleanup policies running in Harbor before, but they just weren't as effective as with werf's cleanup capabilities.

The werf cleanup command considers images currently deployed in Kubernetes and applies Git-based policies. Basically, werf knows which image tags belong to which commits, so it can safely delete only the versions that are no longer relevant (i.e., in use).

Instead of conclusion

We opted not to use trdl, the default installer and version manager for werf, due to security considerations; we were concerned about its tendency to pull and update components in the background. On top of that, convenient alternatives such as go install or brew are readily available. We also weren't keen on an extra dependency for a binary that's fairly self-sufficient and doesn't demand a complicated setup. Thus, we just built everything from the source.

I personally find UV's approach to this highly effective — incorporating the binary into an image or CI/CD pipeline by copying:

COPY --from=ghcr.io/astral-sh/uv:0.8.12 /uv /bin/

Thus, we created our CI images, complete with the required certificates, Python versions, scripts from the DevOps team for Kubernetes authorization, and so on.

We were also fond of the converge command, which allows you to apply changes to the Kubernetes cluster instantly. Furthermore, werf can deploy resources in order of their weight (priority) and features a significantly improved build caching, which is a huge plus. This feature is absolutely killer! Overall, werf does exactly what we need and does it well.