Ambassador

Posted on Jan 7 • Originally published at getambassador.io

Using Mutating Admission Controllers to Ease Kubernetes Migrations

The old staging environment of Ambassador was just a set of namespaces in the same cluster as our production environment. Using namespaces to separate environments is not ideal since changes in staging can affect production. Using namespaces to separate environments is not ideal since changes in staging can affect production.

For example, updating cluster resources like Edge Stack CRDs could impact the whole cluster. It also made us deploy some staging changes manually instead of using the GitOps-style continuous delivery approach due to the risk of breaking production and, as such, missing the key benefits GitOps provides. Migrating applications to a new environment can be challenging due to differences between external dependencies, hardware constraints, etc. In addition, applications constantly change, which makes it difficult to keep the environments up to date.

In this article, I’ll explain how mutating admission controllers enabled us to quickly update the manifests for the new environment while keeping the old environment running.

Our approach to GitOps

Before I explain how we tackled the migration, let’s talk about how we do GitOps.

At Ambassador, all the manifests deployed to a Kubernetes cluster are stored in a Git repository. This repository is monitored by ArgoCD, and when there is a new version of a manifest, ArgoCD will deploy it.

There are typically two types of applications in our cluster:

Applications that we use but don’t own, like Grafana and Prometheus. The manifests for these applications are kept in the manifest repository.

Applications owned by us, like Edge Stack, Telepresence and Blackbird. The source code and manifests for these applications are kept in their own repository. When a change is made, a job will push the latest version of their manifest to ArgoCD’s repository.
ArgoCD

Kubernetes Admission Controllers to the Rescue!

One of the options we had to migrate to the new staging environment was to create a copy of all the staging manifests. This could be implemented in a couple of different ways:

Create a new Git repository to be used as the source for the staging environment's manifests. All Ambassador-owned applications would push the staging manifests here. This approach would require changing several repositories and keeping the old and new staging manifests up to date until the old environment is torn down.

Create branches on each Ambassador-owned application repo, which would be used as the source for staging. Downsides of this approach include: the new staging environment will slowly diverge from the old one unless the branches are kept up to date, and ArgoCD will have to be given permissions to multiple source repositories, which is undesirable from a security perspective.

From a technical point of view, both options would work, but with their respective limitations. However, there were a couple of requirements that had to be met. The old staging environment should be kept up to date (as a backup) in case the new one has issues.

Different development teams use the staging environment, each with its own priorities. This means that the migration process should be transparent to them, and we should be able to onboard teams gradually as their applications are migrated to the new cluster.

Empathize with Users
The last requirement is not a technical one but a ‘people’ one. We always aim to empathize with users, so we wanted a solution that would not just meet the technical goals but also solve the problem sympathetically for our developers.

Keeping in mind the previous constraints, we decided to use the same Git repository in the old and new environments and use a mutating admission controller to patch resources on the fly in the new cluster.

This gave us the following advantages:

All changes necessary to bootstrap the new cluster were defined in one place. Later, these changes could be ported to each of the application repositories. The old and new staging environments could now work side by side, and their differences were minimal.

If the new environment had issues, it was possible to revert to the old one with minimal effort

How does it work?

Kubernetes has a MutatingAdmissionWebhook admission controller that provides a mechanism for configuring webhooks that can modify (mutating webhooks) or reject (validating webhooks) requests to the Kubernetes API server. Typically, these webhooks are used for enforcing security practices, ensuring resources follow specific policies, or configuration management (e.g., configuring resource limits).

We created a mutating webhook called patcher that intercepts every request to create or modify certain resources (e.g., ConfigMaps and Deployments), and transforms the manifest if necessary. Here are some of the updates that patcher handles:

Initializing Docker registry credentials: The patcher will set the

imagePullSecret

field on the ServiceAccount used by a pod, enabling the pod to get images from a private Docker registry.

Replacing values referencing the old staging environment or its dependencies: To run both environments simultaneously, we need to replace any settings that cause conflicts or are incorrect, like hostnames or single-instance integrations.

To do this, the patcher checks the Webhook request. If it contains an object that should be modified, it returns a response with a JSON patch representing the changes to apply.

Blocking the creation of objects that are deprecated or unwanted in the new environment: This is done by setting the allowed key of the webhook response to false.

Now that I’ve explained how patcher works, let’s look at the process to set up a new cluster:

Install patcher and configure it to block production-only applications.

Configure ArgoCD for the new cluster, and deploy all applications so that they start in an ‘inactive’ state. e.g., Secrets for connecting to a DB will be missing.

For each application:

Find all the settings that have to be updated and create a patch.
Update the application so it becomes active.
Once the new cluster is stable and the old one has been retired, we will port each patch back to the source repository and remove it from patcher.

We made other considerations!

While the use of admission controllers made the staging migration easier, there were a few areas that required special consideration to avoid issues. I explain them below:

Cluster availability

Admission controllers can be considered part of the control plane, and as such, they should be carefully designed and implemented. For instance:

Admission webhooks should be highly-available and quick to return. Any issues with the webhook being unavailable can make the cluster inaccessible or break it in unexpected ways. If the webhook is deployed in the Kubernetes cluster, multiple instances can be run behind a service to improve availability.

Limit the scope of the objects modified by the webhook. While you can configure a webhook to accept requests for all objects, this is not recommended since any issues with the webhook can severely impact the cluster. A better approach would be to limit the objects that the webhook modifies.

Persisted vs. live state

The use of mutating admission controllers can introduce differences between the source manifests and the live manifests, which can make it difficult to troubleshoot issues or update applications. We used mutating webhooks as a tool to achieve a goal quickly, but once we finished the migration, additional work was done to port the changes introduced by patcher back to the source repositories.

Admission controllers are a very powerful tool, and migrating applications is just one use case for them. However, care must be taken to ensure that webhooks are reliable and that they don’t introduce unexpected behaviors or change the cluster state in unanticipated ways. This was just one example of how we conducted that migration utilizing this common mechanism provided by Kubernetes itself.

DEV Community

Using Mutating Admission Controllers to Ease Kubernetes Migrations

Our approach to GitOps

Kubernetes Admission Controllers to the Rescue!

How does it work?

For each application:

Cluster availability

Persisted vs. live state

Top comments (0)

Read next

Unlocking Better Code: My Journey with Coding Dojos

Fleet Management 101: Streamlining Operations for Maximum Efficiency

Improving Productivity with Automated Unit Testing

Developing Small Projects in JavaScript: A Beginner's Guide