The second post in our “thank you” series, just in time for the end of the year.
In the first one, we said thanks to Grafana for donating Beyla and making it easier for teams to get to usable telemetry quickly. This time we want to zoom out to something that quietly runs under the hood at Causely every day: GitOps with FluxCD.
Causely is a member of the Cloud Native Computing Foundation (CNCF). That’s not just a logo on the website for us: our entire product and our own operations lean heavily on CNCF projects. We build on OpenTelemetry, we run on Kubernetes, and we dogfood our own reliability engine against that stack.
Another key piece in that puzzle is FluxCD. It is what takes “the desired state in git” and makes it true in our clusters, repeatedly. It’s the heartbeat behind our weekly releases.
Why Flux Helps
If you’re operating modern Kubernetes environments, you’ve probably felt this tension. On one hand, you want velocity: teams push changes constantly, services multiply, and configurations evolve all the time. On the other hand, every manual kubectl apply is a potential one-off change that no one can fully reconstruct later. Over time, clusters drift away from whatever was last written down as “how things should be,” and you are left relying on muscle memory and shell history.
Flux solves exactly that problem by turning git into the control surface and reacting to changes within seconds of a merge.
For us, that has very practical consequences.
- Releases are commits, not hand-crafted ceremonies.
- A new Causely version is rolled out by changing a tag or a value in git; Flux notices the change within seconds, reconciles the cluster, and either converges to the new state or loudly tells us why it couldn’t.
- Environments stay in sync because the same manifests back our test clusters, staging, and production. The differences between them are intentional and visible in overlays, not hidden in one-off fixes on a live production cluster.
- And drift becomes a signal, not a mystery: when the cluster does not match git, Flux shows it, which turns the familiar “what changed?” question into a quick investigation rather than a full-blown incident archaeology.
The net effect is that we can move quickly without giving up control. Our reliability engine depends on a stable substrate; Flux helps us keep it that way.
Best Practices We’ve Learned Using Flux
We didn’t get there on day one. It took a set of habits to turn FluxCD from a cool project into a core platform primitive.
- Treat manifests like product code
- Keep Kustomize overlays boring
- Watch Flux like any other production controller
- Make promotion a path, not an event
Treat manifests like product code
All of our Kubernetes manifests live in git. Not most of them, not “the important ones” – all of them. That sounds obvious, but it changes behavior in subtle ways.
- Reviews happen before things break, because a change to a HelmRelease or a Kustomize overlay goes through the same review process as a feature change.
- The commit history becomes an operational log: when we see a strange spike in errors, we can line it up against recent git changes, including configuration tweaks that would otherwise live only in someone’s bash history.
- Our issue tracker also stays connected to reality, because we reference issue numbers in commit messages, so the question “why did we change this setting?” always has a direct link back to the discussion that justified it.
In the early days, we still made the occasional manual change in production, usually in the name of speed. Those changes always came back to haunt us as confusing states that no one could fully explain. Once we committed to git as the only source of truth and forced ourselves to route every change through it, the platform became much more predictable.
Keep Kustomize overlays boring
We use Kustomize to manage environment-specific differences across test clusters, staging, production, and chaos environments. The rule we eventually settled on is simple: overlays describe differences, not alternative universes.
In practice, that means we maintain a clean base with shared resources such as namespaces, common HelmReleases, and shared configuration. On top of that base, we keep the environment overlays as thin as possible. They patch what truly needs to change, such as cluster names, resource limits, or a particular feature flag, rather than redefining whole stacks.
Whenever we tried to be clever with external references or overlays that diverged heavily from one another, troubleshooting became harder. Keeping overlays compact and predictable means we can scan a diff and understand at a glance what will change in a given cluster. Before committing, we render Kustomize configs locally as a quick sanity check that catches typos and misaligned paths before Flux has to complain about them.
Watch Flux like any other production controller
GitOps is not “set and forget.” Flux is a control loop running in production and, when it is unhappy, your platform will slowly drift.
We treat Flux like a critical controller. We watch reconciliation health and consider a stuck HelmRelease or Kustomization as important as any failing deployment. When Flux cannot talk to git, or when an apply keeps failing, that is something we alert on rather than something we notice days later in a dashboard. And when “nothing seems to be changing” in a cluster despite recent commits, Flux logs are one of the first places we look.
This mindset becomes even more important when GitOps extends beyond just core applications and starts to manage your observability stack, gateways, and even Causely itself.
Make promotion a path, not an event
Flux really shines when you treat deployments as a series of git-based promotions instead of isolated production pushes. A typical Causely release starts with a change landing in a test environment: we use clusters like test1 and test2 for this. We verify that the change behaves as expected there, including how it interacts with telemetry and Causely’s own reasoning about incidents. Once we are happy, we promote the same change to staging by updating the relevant overlay or values. Only after staging behaves as expected do we roll the change into production.
Alongside this path, we maintain dedicated chaos clusters, chaos1 and chaos2, where we deliberately break things to see how the system responds. Because everything flows through git, we can rehearse failure modes without fear of leaving behind strange manual fixes that only exist on one cluster. Keeping cluster-specific configuration isolated and well documented is what allows us to run realistic experiments in those chaos clusters without letting that complexity bleed into production.
Try Flux On Your Own
To really understand Flux, it helps to feel git driving your cluster. The smallest useful experiment is a git repository, a local kubernetes cluster, and Flux bootstrapped from that repository. The nice part is that flux bootstrap already does most of the heavy lifting: it creates the repository, installs the controllers, and wires everything together for you.
You can run the following guide on your laptop with kind.
Start by installing the Flux CLI. The easiest way is via the official install script; if you prefer Homebrew, apt, or other package managers, the Flux documentation lists those options as well in the Flux installation guide.
curl -s https://fluxcd.io/install.sh | sudo bash
Next, export your GitHub credentials so Flux can authenticate and create the repository for you. If you are logged in with the GitHub CLI (gh auth login), you can derive both the user name and token directly from it.
export GITHUB_USER="$(gh api user --jq '.login')"
export GITHUB_TOKEN="$(gh auth token)"
Now, create a local Kubernetes cluster and verify that Flux can run there.
kind create cluster --name flux-playground
When your cluster is ready, bootstrap Flux into it. This command will create a repository called flux-playground-gitops under your GitHub account, install Flux into the flux-system namespace, and configure it to track the ./clusters/flux-playground path in that repo.
flux bootstrap github \
--owner=$GITHUB_USER \
--repository=flux-playground-gitops \
--branch=main \
--path=./clusters/flux-playground \
--personal
Clone the newly created repository to your machine and change into it.
gh repo clone flux-playground-gitops
cd flux-playground-gitops
You are now ready to define the OpenTelemetry demo as a git-managed workload by adding a manifest for the demo under clusters/flux-playground.
cat > clusters/flux-playground/oteldemo.yaml <<'EOF'
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: open-telemetry
namespace: flux-system
spec:
interval: 1m
url: https://open-telemetry.github.io/opentelemetry-helm-charts---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: otel-demo
namespace: flux-system
spec:
interval: 1m
chart:
spec:
chart: opentelemetry-demo
sourceRef:
kind: HelmRepository
name: open-telemetry
namespace: flux-system
targetNamespace: otel-demo
install:
createNamespace: true
EOF
At this point your git repository fully describes both Flux itself and the OpenTelemetry demo that Flux will deploy.
Commit and push these changes so Flux can reconcile the new state.
git add .
git commit -m "Add OpenTelemetry demo via Flux"
git push origin main
Within seconds of the push, Flux will see the new revision, apply the changes, and start rolling out the OpenTelemetry demo. You can watch the pods come up.
watch kubectl get pods -n otel-demo
After a few minutes you will see the OpenTelemetry demo microservices starting in the otel-demo namespace.
You now have a real application being managed by Flux from git: the desired state lives in a repository, Flux reconciles it into the cluster within seconds of your merge, and you never had to use kubectl apply for the actual deployment.
Flux + Causely: GitOps All the Way Down
If you followed the small lab above, you already have a local cluster, Flux installed, and the OpenTelemetry demo running under GitOps control. From there, adding Causely is just one more git-driven change.
Conceptually, the flow is simple. You obtain a Causely access token, store it as a Kubernetes Secret, and add the Causely FluxCD manifests to the same git repository that Flux already manages. Git drives the rollout, Flux reconciles it into the cluster, the OpenTelemetry demo generates realistic behavior, and Causely explains what is going on.
Our documentation has a full FluxCD installation guide with more background and variations. The example below is meant to be something you can copy and adapt directly from your existing flux-playground-gitops setup.
First, retrieve your Causely access token from Causely and keep it handy. Then, create a namespace for Causely and a kubernetes secret with your token. The Causely FluxCD manifests expect a secret named causely-secrets and use Flux's native post-build substitution to inject CAUSELY_TOKEN into the HelmRelease, so the token never has to be committed to git:
kubectl create namespace causely
kubectl create secret generic causely-secrets \
--from-literal=CAUSELY_TOKEN=your-actual-gateway-token-here \
-n causely
Next, from inside your GitOps repository, clone the public causely-deploy repository and copy the FluxCD manifests into your own cluster configuration.
cd flux-playground-gitops
git clone https://github.com/causely-oss/causely-deploy.git
mkdir -p clusters/flux-playground/causely
cp causely-deploy/kubernetes/fluxcd/causely/*.yaml clusters/flux-playground/causely/
With this setup, your git repository now contains everything Flux needs to deploy Causely, and you can commit and push these changes so Flux can reconcile the new state.
git add clusters/flux-playground
git commit -m "Add Causely via Flux"
git push origin main
After the push, Flux notices the change, applies the new manifests, and starts deploying Causely into your cluster. You can watch it the same way you watched the OpenTelemetry demo roll out.
flux get kustomizations -A
kubectl get pods -n causely
Once the Causely pods are healthy, you can return to the Causely portal, where the cluster you just configured will appear. Over time, topology fills in and, as issues arise, you will see root cause views associated with the services in your demo.
At that point, you have a complete loop on a single laptop: Git drives change, Flux applies it, the OpenTelemetry demo generates behavior, and Causely explains what happens when things go wrong. If you want more variations, production-grade knobs, or to run this across multiple clusters, the FluxCD installation guide in our docs walks through additional options in detail.
Closing: Thank You, Flux
FluxCD is a great example of the kind of infrastructure we love in the CNCF ecosystem. It nudges teams toward good habits, turns the question “how did this get here?” into something with a clear, auditable answer, and helps keep complex Kubernetes estates boring and predictable.
As a CNCF member building on OpenTelemetry, Kubernetes, and the wider cloud-native stack, we are genuinely grateful for projects like Flux that quietly raise the floor for everyone.
So: thank you to the Flux maintainers and community for building and maintaining an engine that lets us practice what we preach about control, desired state, and autonomous reliability. If you are running kubernetes and still relying on manual deploys or one-off scripts, Flux is worth a serious look.
And if you want to see what happens when you combine GitOps with causal reasoning, we are always happy to show you how Causely fits into that picture.
Top comments (0)