DEV Community

Mahra Rahimi
Mahra Rahimi

Posted on

How to enable reconciliation windows using Flux and K8s native components

How to enable reconciliation windows for a GitOps Setup using the suspension feature of the flux Kustomize resource and K8s CronJobs.

When using Flux to manage a K8s cluster every new change in your repository will be immediately applied to the cluster’s state. In some use cases, the newest changes to a GitOps repository should only apply to the cluster within a designated time window. For example, the cluster should reconcile to the newest changes of the GitOps repository only between Monday 8am to Thursday 5pm. Any change coming in to the GitOps repository on Friday or the weekend will have to wait till Monday 8am to be applied.

What are the scenarios this could be used for in real life?

  • Sometimes the cluster is connected to external systems, which need to be in maintenance mode before updates can be applied.
  • You want to be able to determine a designated time window when the next changes go into production, so that in case of issue you are able to react quickly.

So our problem in short:
We want to be able to predefine time windows to deploy all new changes to a cluster that is managed by Flux.

To make things easier, let's call these time windows "reconciliation windows" and dig right into how to solve the problem.

Pre-requisits:

Core principles

Now how do we create such reconciliation windows using Flux and K8s native resources?
To go there we first need to understand how the Flux Kustomization and Flux Source resource work, and how we can leverage this to solve our problem.

When setting up a cluster with Flux there will always be a Source resource that reconciles the changes from the GitOps repository into the cluster.
After that, the Kustomization resource will poll the newest changes from the Source resource and apply them to the cluster.

How Flux controls the cluster using the  raw `Source` endraw  and  raw `Kustomization` endraw  resource

Now interestingly enough both of the reconciliations of these resources can be suspended.

Suspend Source/Kustomization resource from reconciling

flux suspend source <name>
flux suspend kustomization <name>
Enter fullscreen mode Exit fullscreen mode

Resume reconciling of Source/Kustomization resource

flux resume source <name>
flux resume kustomization <name>
Enter fullscreen mode Exit fullscreen mode

Suspending the Kustomization resource means no changes are applied to the cluster:

Suspending a  raw `Kustomization` endraw  resource

Since our goal is to suspend the reconciliation of the cluster state, just suspending the Kustomization resource is enough. The Source resource can continues syncing content in the predefined interval.

Schedule opening and closing of reconciliation windows

So far so good. But how do we automate this?
Well, K8s has already native ways to support scheduling of jobs, which are CronJob resources, so why not use them?

With Cron Jobs we can create an open-reconciliation-window-job and a close-reconciliation-window-job which will use the Flux CLI and a ServiceAccount to resume/suspend the kustomizations.
Let's use the “No-deployment Friday” example. For the reconciliation window from every Monday 8:00 am to Thursday 5:00 pm, this is how the jobs would look.

Note: The ServiceAccount and the corresponding RoleBinding and Role is needed to give the job the right access to perform operations on the cluster resources. For more information on this see the K8s docs on configuring service accounts

# open-reconciliation-window-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: open-reconciliation-window
  namespace: jobs
spec:
  schedule: "0 8 * * MON"
  suspend: true
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: sa-job-runner
          containers:
            - name: hello
              image: ghcr.io/fluxcd/flux-cli:v0.36.0
              imagePullPolicy: IfNotPresent
              command: ["/bin/sh", "-c"]
              args:
                - flux resume kustomization infra -n flux-system;
                  flux resume kustomization apps -n flux-system;
          restartPolicy: Never
Enter fullscreen mode Exit fullscreen mode
# close-reconciliation-window-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: close-reconciliation-window
  namespace: jobs
spec:
  schedule: "0 17 * * THU"
  suspend: true
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: sa-job-runner
          containers:
            - name: hello
              image: ghcr.io/fluxcd/flux-cli:v0.36.0
              imagePullPolicy: IfNotPresent
              command: ["/bin/sh", "-c"]
              args:
                - flux suspend kustomization infra -n flux-system;
                  flux suspend kustomization apps -n flux-system;
          restartPolicy: Never
Enter fullscreen mode Exit fullscreen mode

Note: you can customize the window times as you want by playing with the scheduling string set in specs.schedule. There are a few online tools to help you understand how these cron-strings work, eg crontab guru.

Scale by using GitOps to manage reconciliation windows in GitOps

At this point, we have the capabilities to resume and suspend, but we still need to create the CronJobs manually for each cluster.

Imagine we have a GitOps repository that manages 10+ clusters. Not all of these clusters will probably have their reconciliation window set at the same time. Also, you don't want to manually have to create these jobs, let alone maintain the jobs if for example more Kustomization resources get added to the cluster.

Not to worry, there is also a solution for that ;)

I mean we are already using GitOps? Why not stick the definition of the job into the repository as part of our infrastructure?
And why not use kustomize's patch functionality to overwrite the CronJob's cron string to be able to customize the reconciliation window times for each cluster?

If that sounds interesting check out the full sample here.
Now instead of having to manually create the ClusterRole, RoleBinding, ServiceAccount, and CronJobs, Flux will take care of that for us.

Reconciliation windows

Conclusion

Now this is how we can leverage Flux and K8s native approaches to restrict the application of changes to a cluster to happen only in a reconciliation window.
There are a few advantages to this approach:

  • For clusters running on the edge, if the connectivity goes down during a reconciliation window, simple changes will still reconcile normally. This is because the Source resource already pulled the newest changes.

Note: Careful this only works for image tag changes if there is a local ACR. Else the new images need to be pre-downloaded to the device

  • The GitOps repository reflects the desired state after a reconciliation window of the cluster.
  • No need to maintain a custom gateway or such. All the used components are open-source and there is no need for custom logic.
  • During the reconciliation windows changes are applied like we used to know from Flux.

What we are however not solving with this, is scheduling fine granular changes. As you might have noticed the granularity end at every resource which is managed by the Kustomization resource the CronJobs suspend and resume. So individual configuration cannot be managed with this approach.

That did not solve your problem yet and your cluster needs real-time changes, as well as changes within a reconciliation window. Not to worry, got you ;) Check out the next part.

Top comments (3)

Collapse
 
assaf_schwartz_d7cfa7703f profile image
Assaf Schwartz

what did you use for creating the diagrams in those GIFs?

thanks in advance :)

Collapse
 
mahrrah profile image
Mahra Rahimi

Hi Assaf, just simple powerpoint. You can create each step in an individual slide and export it as a GIF :)

Collapse
 
waqasdev54 profile image
Waqas Ahmed

It is simple but very clear , sometime we don't want to suspend kustomization , the reason is for example if some daemonsets gone from your nodes , what would you do , as you have suspended kustomization , so suspention is not a good solution , we can think of introducing a webhook between kustomize controller and source controller , or b/w kustomize controller and k8s cluster which will give us gate approval solution.