LinkedIn Draft — Workflow (2026-03-28)
A hard-earned rule from incident retrospectives:
GitOps drift: the silent accumulation that makes clusters unmanageable
GitOps promises Git as the source of truth. The reality: every manual kubectl during an incident is a lie you told your cluster and forgot to retract.
GitOps truth gap over time:
Week 1: Git ══════════ Cluster (clean)
Week 4: Git ══════╌╌╌╌ Cluster (2 manual patches)
Week 12: Git ════╌╌╌╌╌╌╌╌╌╌╌╌╌ (drift accumulates)
Cluster (unknown state)
Where it breaks:
▸ Manual patches during incidents create cluster state Git doesn't know about — Argo/Flux will overwrite it silently.
▸ Secrets managed outside GitOps (sealed-secrets, Vault agent) drift independently — invisible in sync status.
▸ Multi-cluster setups multiply drift: each cluster diverges at its own pace once human intervention happens.
The rule I keep coming back to:
→ Treat every manual cluster change as a 5-minute loan. Commit it back to Git before the incident closes — or it's gone.
How I sanity-check it:
▸ Argo CD drift detection dashboard — surface out-of-sync resources before they become incident contributors.
▸ Weekly diff job: live cluster state vs Git. Opens a PR for anything untracked. Makes drift visible before it's painful.
The best platform teams I've seen measure success by how rarely product teams have to think about infrastructure.
Deep dive: https://neeraja-portfolio-v1.vercel.app/workflows
Curious what guardrails you've built around this. Drop your pattern below.
Top comments (0)