Introduction
There’s something very satisfying about watching a system converge. You push a change to Git. A pipeline runs. A few minutes later, the cluster reflects exactly what you defined.
The system was Clean, Predictable and Repeatable.
For a long time, I thought that was the hard part. It turns out, it isn’t.
The Setup
This was a fairly typical setup.
- Kubernetes cluster on GKE
- Applications deployed using Helm
- GitLab CI driving deployments
- Terraform managing underlying infrastructure
The deployment flow was simple:
git push origin main
Which triggered a GitLab pipeline:
deploy:
stage: deploy
script:
- helm upgrade --install payments ./chart -f values.yaml
Nothing fancy. A basic Helm chart defined a service with autoscaling.
# values.yaml
replicaCount: 2
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Everything was stable.
The Change
We had started experimenting with using AI to assist in updating configurations. One suggestion came in to “optimize performance under load.” The diff looked reasonable.
autoscaling:
enabled: true
- maxReplicas: 10
+ maxReplicas: 30
resources:
requests:
- cpu: "200m"
+ cpu: "500m"
Nothing obviously wrong. More replicas. More CPU. Better performance.
The System Converges
The change was merged. GitLab CI picked it up.
Running with gitlab-runner...
$ helm upgrade --install payments ./chart -f values.yaml
Release "payments" has been upgraded
A quick check:
kubectl get pods -n payments
payments-7d9f6c8c4d-abc12 1/1 Running
payments-7d9f6c8c4d-def34 1/1 Running
Everything looked healthy. No errors. No failed deploys. From a system perspective:
Everything worked.
But Something Was Off…
A few hours later, the signals started showing up.
- cluster CPU usage was higher than usual — node autoscaling kicked in aggressively
- costs started creeping up — some unrelated workloads saw intermittent throttling
Nothing had technically failed. But the system behavior had changed.
The Problem Wasn’t the Pipeline. The pipeline did exactly what it was designed to do.
- Git was the source of truth
- the change was applied
- the cluster matched the desired state There was no drift. No inconsistency. The problem wasn’t reconciliation. It was the definition of the desired state.
Where Context Was Missing — The AI-generated suggestion didn’t know:
- this service wasn’t latency-critical — the cluster had shared resource constraints
- aggressive scaling could impact other services — cost limits were important in this environment From a configuration standpoint, the change was valid. From a system standpoint, it was incomplete.
The Subtle Danger — This is where AI + GitOps becomes interesting.
GitOps gives us a powerful guarantee:
the system will converge to the declared state
But it does not guarantee:
that the declared state is the right one
And AI, without sufficient context, can generate configurations that are:
- technically correct
- syntactically valid
- operationally deployable …but not aligned with the system as a whole.
Everything Looks “Green”. Even the deployment flow looks clean:
helm upgrade --install payments ./chart -f values.yaml
And Kubernetes happily reports:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS
payments Deployment/payments 40%/80% 2 30
No alerts. No failures. Just a system doing exactly what it was told to do.
The Realization
Over time, this became clearer. Reconciliation is deterministic. Context is not. GitLab CI will apply whatever is in Git. Kubernetes will enforce whatever is defined. But neither of them understands:
- intent
- trade-offs
- system-wide impact
That layer still depends on context.
Where AI Actually Helps
AI becomes genuinely useful when it helps us understand changes, not blindly generate them.
For example:
- explaining what a Helm diff actually means
- highlighting scaling implications
- surfacing cost impact
- identifying potential blast radius
That’s where AI shines and actually adds value.
Closing Thought
GitOps is incredibly powerful. It gives us consistency, traceability, and convergence. But it assumes that the desired state is correct. In AI-assisted workflows, that assumption becomes weaker. Because now, the desired state may be influenced by a system that doesn’t fully understand the environment. And that shifts the problem.
From:
Will the system converge?
To:
Are we converging to the right thing?
GitOps guarantees convergence. It does not guarantee correctness. That depends on context — and context still needs humans.
Do you agree?
Note: This post was developed using AI-assisted writing tools. While AI helped with structuring and phrasing, all concepts and examples reflect real-world engineering experience.
Originally published on Medium:
Top comments (0)