Introduction
Imagine if every infrastructure change was peer-reviewed, version-controlled, and automatically applied—just like application code. This is the promise of GitOps, a paradigm that treats Git as the single source of truth for your entire infrastructure and application stack.
GitOps has transformed how teams manage Kubernetes clusters and cloud infrastructure, enabling faster deployments, better collaboration, and automatic drift detection. In this comprehensive guide, we'll explore GitOps principles, tools, and best practices for implementing GitOps in your organization.
What is GitOps?
GitOps is a set of practices where Git repositories contain the entire desired state of your system, and automated processes ensure that the actual state matches the desired state defined in Git.
Core Principles
1. Declarative Configuration
Your infrastructure is defined declaratively using files (YAML, HCL, JSON) rather than imperative scripts. You describe what you want, not how to create it.
2. Git as Single Source of Truth
All configuration, infrastructure code, and application manifests live in Git. What's in Git is what should be running in production.
3. Automated Synchronization
Controllers continuously monitor Git repositories and automatically apply changes to your infrastructure when the repository changes.
4. Continuous Reconciliation
Controllers continuously compare actual state with desired state in Git and automatically correct any drift.
GitOps vs Traditional DevOps
Traditional DevOps:
Developer → CI/CD Pipeline → kubectl apply → Cluster
↓
(Push model)
GitOps:
Developer → Git Repository ← GitOps Controller → Cluster
(Pull model)
Why GitOps Matters
Improved Developer Experience
Developers work with familiar Git workflows. Want to deploy? Create a pull request. Want to rollback? Revert a commit.
# Traditional deployment
kubectl apply -f deployment.yaml
# Wait, what version is running now?
# Which cluster did I just deploy to?
# Can I rollback easily?
# GitOps deployment
git commit -m "Update app to v2.0"
git push
# GitOps controller automatically deploys
# Full audit trail in Git history
# Rollback = git revert
Audit Trail and Compliance
Every change is tracked in Git with author, timestamp, and reason. Perfect for compliance requirements.
Disaster Recovery
Your entire infrastructure is in Git. Lose your cluster? Spin up a new one and point your GitOps controller at your repository—infrastructure automatically recreated.
Security Benefits
- No cluster credentials in CI/CD pipelines
- All changes reviewed via pull requests
- Automatic detection and correction of unauthorized changes
- Principle of least privilege—only GitOps controller needs cluster access
Multi-Cluster Management
Manage dozens or hundreds of clusters from a single Git repository with environment-specific configuration.
GitOps Tools
Flux CD
Flux is a CNCF graduated project that implements GitOps for Kubernetes.
# Install Flux
flux bootstrap github \
--owner=myorg \
--repository=fleet-infra \
--branch=main \
--path=clusters/production \
--personal
# GitRepository source
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: myapp
namespace: flux-system
spec:
interval: 1m
url: https://github.com/myorg/myapp
ref:
branch: main
---
# Kustomization - Applies manifests from Git
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: myapp
namespace: flux-system
spec:
interval: 5m
path: "./k8s/production"
prune: true
sourceRef:
kind: GitRepository
name: myapp
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: myapp
namespace: production
Argo CD
Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes.
# Install Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f \
https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Argo CD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
# Source repository
source:
repoURL: https://github.com/myorg/myapp
targetRevision: HEAD
path: k8s/production
# Destination cluster
destination:
server: https://kubernetes.default.svc
namespace: production
# Sync policy
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
Atlantis (for Terraform)
Atlantis brings GitOps to Terraform via pull request automation.
# atlantis.yaml
version: 3
projects:
- name: production-infrastructure
dir: terraform/production
workspace: production
terraform_version: v1.5.0
autoplan:
when_modified:
- "*.tf"
- "*.tfvars"
apply_requirements:
- approved
- mergeable
# Developer workflow
git checkout -b add-s3-bucket
# Modify Terraform files
git commit -m "Add S3 bucket for logs"
git push
# Create PR - Atlantis automatically runs terraform plan
# Team reviews plan in PR comment
# After approval, comment: atlantis apply
# Atlantis applies changes
Repository Structure
Monorepo vs Multi-Repo
Monorepo: All infrastructure in one repository
infra/
├── clusters/
│ ├── production/
│ │ ├── flux-system/
│ │ ├── apps/
│ │ └── infrastructure/
│ ├── staging/
│ └── development/
├── apps/
│ ├── myapp/
│ │ ├── base/
│ │ └── overlays/
│ │ ├── production/
│ │ ├── staging/
│ │ └── development/
│ └── otherapp/
└── infrastructure/
├── base/
└── overlays/
Multi-Repo: Separate repos per app/component
Repo: cluster-config
├── production/
├── staging/
└── development/
Repo: myapp
├── k8s/
│ ├── base/
│ └── overlays/
├── src/
└── Dockerfile
Repo: infrastructure
├── terraform/
└── k8s/
Recommended Structure
fleet-infra/ # GitOps configuration repo
├── clusters/
│ ├── production/
│ │ ├── flux-system/ # Flux components
│ │ ├── sources/ # Git/Helm sources
│ │ ├── infrastructure/ # Core infrastructure
│ │ │ ├── ingress-nginx/
│ │ │ ├── cert-manager/
│ │ │ └── external-dns/
│ │ └── apps/ # Applications
│ │ ├── myapp.yaml
│ │ └── otherapp.yaml
│ ├── staging/
│ └── development/
└── base/ # Shared base configs
├── ingress-nginx/
└── cert-manager/
GitOps Workflows
Application Deployment Workflow
# 1. Developer updates application version in Git
# apps/myapp/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: myapp
newTag: v2.0.1 # Updated from v2.0.0
# 2. Create pull request
git checkout -b update-myapp-v2.0.1
git add apps/myapp/production/kustomization.yaml
git commit -m "Update myapp to v2.0.1"
git push origin update-myapp-v2.0.1
# Create PR on GitHub/GitLab
# 3. Team reviews PR
# - Review image tag change
# - Check release notes
# - Verify CI tests passed
# 4. After approval, merge PR
# GitOps controller detects change and deploys automatically
# 5. Monitor deployment
flux logs --follow
kubectl get pods -n production -w
# 6. If issues, rollback via Git
git revert <commit-hash>
git push
# GitOps controller automatically rolls back
Infrastructure Change Workflow
# 1. Add new infrastructure component
# infrastructure/monitoring/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml
- prometheus.yaml
- grafana.yaml
# 2. Create PR with infrastructure changes
git checkout -b add-prometheus
git add infrastructure/monitoring/
git commit -m "Add Prometheus monitoring stack"
git push
# 3. Review includes:
# - Security implications
# - Resource requirements
# - Backup/recovery procedures
# - Documentation updates
# 4. After approval, merge
# GitOps controller deploys monitoring stack
# 5. Verify deployment
kubectl get pods -n monitoring
kubectl get svc -n monitoring
Multi-Environment Promotion
# Progressive delivery across environments
# 1. Deploy to development (automatic)
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
name: myapp-dev
spec:
sourceRef:
kind: GitRepository
name: myapp
git:
checkout:
ref:
branch: main
commit:
author:
name: FluxBot
email: flux@example.com
update:
path: ./k8s/development
strategy: Setters
# 2. Promote to staging (manual PR)
# Copy image tag from development to staging
# Create PR for review
# 3. Promote to production (manual PR + approval)
# Copy image tag from staging to production
# Require approvals from multiple team members
# Schedule deployment for low-traffic window
Advanced GitOps Patterns
Progressive Delivery with Flagger
Automate canary deployments with GitOps:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
service:
port: 80
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1m
webhooks:
- name: load-test
url: http://flagger-loadtester/
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary/"
Image Automation
Automatic updates when new images are pushed:
# ImageRepository - Scan for new images
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageRepository
metadata:
name: myapp
namespace: flux-system
spec:
image: myregistry.com/myapp
interval: 1m
---
# ImagePolicy - Define version policy
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
metadata:
name: myapp
namespace: flux-system
spec:
imageRepositoryRef:
name: myapp
policy:
semver:
range: 1.x.x
---
# ImageUpdateAutomation - Auto-commit updates
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
name: myapp
namespace: flux-system
spec:
sourceRef:
kind: GitRepository
name: myapp
git:
checkout:
ref:
branch: main
commit:
author:
name: FluxBot
email: flux@example.com
messageTemplate: |
Update myapp to {{range .Updated.Images}}{{println .}}{{end}}
push:
branch: main
update:
path: ./k8s/production
strategy: Setters
Multi-Cluster Management
Manage multiple clusters from single repo:
# Cluster definition
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-us-east
namespace: fleet-system
---
# Fleet GitRepository
apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
name: shared-apps
namespace: fleet-default
spec:
repo: https://github.com/myorg/apps
paths:
- apps/
targets:
- name: production
clusterSelector:
matchLabels:
env: production
region: us-east
Secrets Management with GitOps
Sealed Secrets:
# Encrypt secret for safe Git storage
echo -n 'supersecret' | kubectl create secret generic db-password \
--dry-run=client \
--from-file=password=/dev/stdin \
-o yaml | \
kubeseal -o yaml > sealed-secret.yaml
# Commit encrypted secret
git add sealed-secret.yaml
git commit -m "Add database password"
git push
# Sealed Secrets controller decrypts in cluster
External Secrets Operator:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: myapp-secrets
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: myapp-secrets
creationPolicy: Owner
data:
- secretKey: database-password
remoteRef:
key: prod/myapp/db-password
SOPS (Secrets OPerationS):
# .sops.yaml
creation_rules:
- path_regex: .*.yaml
encrypted_regex: ^(data|stringData)$
kms: arn:aws:kms:us-east-1:123456789:key/12345-abcd
# Encrypt secret
sops --encrypt secret.yaml > secret.enc.yaml
# Commit encrypted version
git add secret.enc.yaml
# Flux decrypts automatically
Monitoring and Observability
Flux Monitoring
# Prometheus ServiceMonitor for Flux
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: flux-system
namespace: flux-system
spec:
selector:
matchLabels:
app: flux
endpoints:
- port: http-prom
interval: 30s
Alerts for GitOps Issues
# Alert when sync fails
apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Alert
metadata:
name: sync-failures
namespace: flux-system
spec:
providerRef:
name: slack
eventSeverity: error
eventSources:
- kind: Kustomization
name: '*'
- kind: HelmRelease
name: '*'
---
# Slack provider
apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Provider
metadata:
name: slack
namespace: flux-system
spec:
type: slack
channel: gitops-alerts
address: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
Drift Detection
# Detect manual changes to cluster
flux diff kustomization myapp --path ./k8s/production
# Flux automatically corrects drift if selfHeal enabled
syncPolicy:
automated:
selfHeal: true
Best Practices
1. Repository Organization
✓ Separate infrastructure and application repos
✓ Use environment-specific directories
✓ Leverage Kustomize overlays for DRY configs
✓ Keep secrets out of Git (use sealed secrets/SOPS)
✓ Document repository structure in README
2. Pull Request Workflow
# CODEOWNERS file for mandatory reviews
# Production changes require DevOps team approval
/clusters/production/ @devops-team
/infrastructure/ @devops-team @security-team
# Application teams own their apps
/apps/myapp/ @myapp-team
3. Testing GitOps Changes
# Test manifests locally before pushing
kustomize build k8s/production | kubectl apply --dry-run=client -f -
# Validate with kubeval
kustomize build k8s/production | kubeval
# Security scanning
kustomize build k8s/production | kubesec scan -
# Policy validation with OPA
kustomize build k8s/production | conftest test -
4. Rollback Strategy
# Git-based rollback
git log --oneline # Find commit to revert to
git revert <commit-hash>
git push
# Or reset to previous version
git reset --hard HEAD~1
git push --force # Use with caution!
# Flux will automatically sync to previous state
5. Progressive Delivery
Development → Staging → Production-Canary → Production-Full
↓ ↓ ↓ ↓
Automatic Manual PR Manual PR + Manual PR +
Review Approval + Multiple Approvals
Canary Analysis
Common Pitfalls
1. Not Using Environment Branches/Directories
❌ Bad: Single main branch with manual changes
main/
└── k8s/
└── manifests.yaml # Which environment?
✓ Good: Clear environment separation
main/
├── development/
├── staging/
└── production/
2. Committing Secrets to Git
❌ Never:
apiVersion: v1
kind: Secret
metadata:
name: db-password
stringData:
password: supersecret123
✓ Use Sealed Secrets or External Secrets
3. Missing RBAC for GitOps Controller
# Grant minimal necessary permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: flux-reconciler
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "patch"]
# Don't grant cluster-admin!
4. No Monitoring of GitOps Controller
✓ Monitor:
- Controller health and uptime
- Sync success/failure rate
- Time since last successful sync
- Git repository accessibility
- Kubernetes API responsiveness
Conclusion
GitOps represents a fundamental shift in how we manage infrastructure and applications. By treating Git as the single source of truth, teams gain:
- Improved collaboration through familiar Git workflows
- Enhanced security with audit trails and access control
- Better reliability through automated drift detection
- Faster recovery from complete infrastructure recreation
- Simplified compliance with complete change history
Start your GitOps journey by:
- Choose a GitOps tool (Flux or Argo CD for Kubernetes)
- Organize your repository structure
- Start with non-production environments
- Implement proper RBAC and secrets management
- Set up monitoring and alerting
- Gradually expand to production workloads
GitOps isn't just a tool—it's a cultural shift toward treating infrastructure like code, with all the benefits that brings.
Need help implementing GitOps? InstaDevOps provides expert consulting and implementation for GitOps, Kubernetes, and infrastructure automation. Contact us for a free consultation.
Need Help with Your DevOps Infrastructure?
At InstaDevOps, we specialize in helping startups and scale-ups build production-ready infrastructure without the overhead of a full-time DevOps team.
Our Services:
- 🏗️ AWS Consulting - Cloud architecture, cost optimization, and migration
- ☸️ Kubernetes Management - Production-ready clusters and orchestration
- 🚀 CI/CD Pipelines - Automated deployment pipelines that just work
- 📊 Monitoring & Observability - See what's happening in your infrastructure
Special Offer: Get a free DevOps audit - 50+ point checklist covering security, performance, and cost optimization.
📅 Book a Free 15-Min Consultation
Originally published at instadevops.com
Top comments (0)