Yogesh VK

Posted on May 8 • Originally published at Medium

AI + GitOps: Why Context Matters More Than Reconciliation

#ai #devops #cicd

Introduction

There’s something very satisfying about watching a system converge. You push a change to Git. A pipeline runs. A few minutes later, the cluster reflects exactly what you defined.

The system was Clean, Predictable and Repeatable.

For a long time, I thought that was the hard part. It turns out, it isn’t.

The Setup

This was a fairly typical setup.

Kubernetes cluster on GKE
Applications deployed using Helm
GitLab CI driving deployments
Terraform managing underlying infrastructure

The deployment flow was simple:
git push origin main

Which triggered a GitLab pipeline:

deploy:
  stage: deploy
  script:
    - helm upgrade --install payments ./chart -f values.yaml

Nothing fancy. A basic Helm chart defined a service with autoscaling.

# values.yaml
replicaCount: 2

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10

resources:
  requests:
    cpu: "200m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Everything was stable.

The Change

We had started experimenting with using AI to assist in updating configurations. One suggestion came in to “optimize performance under load.” The diff looked reasonable.

 autoscaling:
   enabled: true
-  maxReplicas: 10
+  maxReplicas: 30

 resources:
   requests:
-    cpu: "200m"
+    cpu: "500m"

Nothing obviously wrong. More replicas. More CPU. Better performance.

The System Converges

The change was merged. GitLab CI picked it up.

Running with gitlab-runner...
$ helm upgrade --install payments ./chart -f values.yaml
Release "payments" has been upgraded

A quick check:

kubectl get pods -n payments
payments-7d9f6c8c4d-abc12   1/1 Running
payments-7d9f6c8c4d-def34   1/1 Running

Everything looked healthy. No errors. No failed deploys. From a system perspective:

Everything worked.

But Something Was Off…

A few hours later, the signals started showing up.

cluster CPU usage was higher than usual — node autoscaling kicked in aggressively
costs started creeping up — some unrelated workloads saw intermittent throttling

Nothing had technically failed. But the system behavior had changed.

The Problem Wasn’t the Pipeline. The pipeline did exactly what it was designed to do.

Git was the source of truth
the change was applied
the cluster matched the desired state There was no drift. No inconsistency. The problem wasn’t reconciliation. It was the definition of the desired state.

Where Context Was Missing — The AI-generated suggestion didn’t know:

this service wasn’t latency-critical — the cluster had shared resource constraints
aggressive scaling could impact other services — cost limits were important in this environment From a configuration standpoint, the change was valid. From a system standpoint, it was incomplete.

The Subtle Danger — This is where AI + GitOps becomes interesting.

GitOps gives us a powerful guarantee:

the system will converge to the declared state

But it does not guarantee:

that the declared state is the right one

And AI, without sufficient context, can generate configurations that are:

technically correct
syntactically valid
operationally deployable …but not aligned with the system as a whole.

Everything Looks “Green”. Even the deployment flow looks clean:
helm upgrade --install payments ./chart -f values.yaml

And Kubernetes happily reports:

kubectl get hpa

NAME       REFERENCE             TARGETS   MINPODS   MAXPODS
payments   Deployment/payments   40%/80%   2         30

No alerts. No failures. Just a system doing exactly what it was told to do.

The Realization

Over time, this became clearer. Reconciliation is deterministic. Context is not. GitLab CI will apply whatever is in Git. Kubernetes will enforce whatever is defined. But neither of them understands:

intent
trade-offs
system-wide impact

That layer still depends on context.

Where AI Actually Helps

AI becomes genuinely useful when it helps us understand changes, not blindly generate them.

For example:

explaining what a Helm diff actually means
highlighting scaling implications
surfacing cost impact
identifying potential blast radius

That’s where AI shines and actually adds value.

Closing Thought

GitOps is incredibly powerful. It gives us consistency, traceability, and convergence. But it assumes that the desired state is correct. In AI-assisted workflows, that assumption becomes weaker. Because now, the desired state may be influenced by a system that doesn’t fully understand the environment. And that shifts the problem.

From:
Will the system converge?

To:
Are we converging to the right thing?

GitOps guarantees convergence. It does not guarantee correctness. That depends on context — and context still needs humans.

Do you agree?

Note: This post was developed using AI-assisted writing tools. While AI helped with structuring and phrasing, all concepts and examples reflect real-world engineering experience.

Originally published on Medium:

medium.com