The Last Time I Used SSH
I haven't SSH'd into a production server in 14 months. Not because I'm lazy because our infrastructure doesn't require it.
GitOps changed everything.
What GitOps Actually Means
GitOps = Git is the single source of truth for your infrastructure. Every change goes through a PR. No manual kubectl, no SSH, no ClickOps.
Traditional: Developer → kubectl apply → Cluster
GitOps: Developer → Git PR → CI Review → ArgoCD → Cluster
Our Setup
Repositories:
├── app-service-a/ # Application code + Dockerfile
├── app-service-b/ # Application code + Dockerfile
└── infrastructure/ # All K8s manifests
├── base/ # Shared configurations
│ ├── namespaces/
│ ├── network-policies/
│ └── rbac/
├── services/
│ ├── api-service/
│ │ ├── deployment.yaml
│ │ ├── service.yaml
│ │ ├── hpa.yaml
│ │ └── kustomization.yaml
│ └── payment-service/
└── environments/
├── staging/
│ └── kustomization.yaml
└── production/
└── kustomization.yaml
ArgoCD Configuration
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: production-services
spec:
project: default
source:
repoURL: https://github.com/org/infrastructure
targetRevision: main
path: environments/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Revert manual changes
syncOptions:
- CreateNamespace=true
The key: selfHeal: true. If someone manually changes something, ArgoCD reverts it within 3 minutes. Git is the truth.
The Deployment Flow
# 1. Developer pushes code to app repo
# 2. CI builds image: api-service:sha-abc123
# 3. CI updates infrastructure repo:
# Automated PR to infrastructure repo
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
template:
spec:
containers:
- name: api
image: registry/api-service:sha-abc123 # Updated by CI
# GitHub Actions: Update image tag
- name: Update manifest
run: |
cd infrastructure
kustomize edit set image api-service=registry/api-service:${{ github.sha }}
git add.
git commit -m "deploy: api-service ${{ github.sha }}"
git push
Benefits We've Seen
1. Complete Audit Trail
# Who changed what and when?
git log --oneline environments/production/
# a1b2c3d deploy: api-service sha-abc123 (2024-03-15)
# d4e5f6g feat: add rate limiting to api (2024-03-14)
# g7h8i9j fix: increase memory limit for payment (2024-03-13)
2. Easy Rollbacks
# Rollback = revert a commit
git revert HEAD
git push
# ArgoCD detects change, reverts cluster to previous state
# Total time: ~90 seconds
3. Environment Parity
# environments/staging/kustomization.yaml
bases:
-../../services/api-service
patchesStrategicMerge:
- replicas-patch.yaml # 1 replica instead of 3
# environments/production/kustomization.yaml
bases:
-../../services/api-service
patchesStrategicMerge:
- replicas-patch.yaml # 3 replicas
- resources-patch.yaml # More CPU/memory
Same base, different overlays. Drift between environments becomes impossible.
4. Disaster Recovery
# Cluster dies? No problem.
# 1. Provision new cluster
# 2. Install ArgoCD
# 3. Point ArgoCD at Git repo
# 4. Everything reconverges automatically
# Recovery time: ~15 minutes (cluster provisioning)
# Data loss: zero (Git has everything)
The Cultural Shift
The hardest part wasn't technical. It was convincing engineers to stop using kubectl directly.
Old way: "I'll just quickly fix this in production" (kubectl edit)
New way: "I'll open a PR to fix this" (5 minutes longer, 100% safer)
After 3 months, nobody missed kubectl. The safety and audit trail are worth the extra 5 minutes.
If you want GitOps with AI-powered drift detection and automated remediation, check out what we're building at Nova AI Ops.
Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com
Top comments (0)