LinkedIn Draft — Workflow (2026-04-21)
A hard-earned rule from incident retrospectives:
GitOps drift: the silent accumulation that makes clusters unmanageable
GitOps promises Git as the source of truth. The reality: every manual kubectl during an incident is a lie you told your cluster and forgot to retract.
GitOps truth gap over time:
Week 1: Git ══════════ Cluster (clean)
Week 4: Git ══════╌╌╌╌ Cluster (2 manual patches)
Week 12: Git ════╌╌╌╌╌╌╌╌╌╌╌╌╌ (drift accumulates)
Cluster (unknown state)
Where it breaks:
▸ Manual patches during incidents create cluster state Git doesn't know about — Argo/Flux will overwrite it silently.
▸ Secrets managed outside GitOps (sealed-secrets, Vault agent) drift independently — invisible in sync status.
▸ Multi-cluster setups multiply drift: each cluster diverges at its own pace once human intervention happens.
The rule I keep coming back to:
→ Treat every manual cluster change as a 5-minute loan. Commit it back to Git before the incident closes — or it's gone.
How I sanity-check it:
▸ Argo CD drift detection dashboard — surface out-of-sync resources before they become incident contributors.
▸ Weekly diff job: live cluster state vs Git. Opens a PR for anything untracked. Makes drift visible before it's painful.
The best platform teams I've seen measure success by how rarely product teams have to think about infrastructure.
Curious what guardrails you've built around this. Drop your pattern below.
Top comments (0)