Guatu

Posted on Mar 28 • Edited on May 8 • Originally published at guatulabs.dev

GitOps for Homelabs: How ArgoCD App-of-Apps Scales Your Cluster

#gitops #argocd #kubernetes #homelab

Managing a homelab Kubernetes cluster by hand eventually breaks you. Not in a dramatic way — in a slow, grinding way. You tweak a deployment to fix something urgent, forget to commit the change, and three months later you can't reproduce your own setup. A node dies and you're reverse-engineering what was running on it from kubectl get output and fuzzy memory.

I hit that wall. The fix was GitOps, specifically ArgoCD's app-of-apps pattern. This post is about how I structured it, what I got wrong the first time, and what actually works in a homelab context where you're the only engineer and iteration speed matters.

Why App-of-Apps and Not Just Helm

ArgoCD has several ways to manage applications. The simplest is pointing ArgoCD at a single app manifest — done. That works fine for three apps. It doesn't scale when you have fifteen namespaces and forty-plus workloads, each with their own sync policies, health checks, and override values.

The app-of-apps pattern solves this with a hierarchy: one parent Application that ArgoCD watches. That parent's source is a directory of child Application manifests. When ArgoCD syncs the parent, it discovers and creates all the children. Each child then manages its own workload independently.

The result: your entire cluster state is defined in Git. Adding a new app is a git push. Removing one is a git rm and a sync. Disaster recovery becomes argocd app sync root-app instead of a weekend of YAML archaeology.

Repo Structure That Actually Works

I tried a few layouts before settling on one I could actually navigate. The key insight is separating "ArgoCD app definitions" from "workload manifests":

kubernetes/
├── apps/              # ArgoCD Application CRDs (the app-of-apps layer)
│   ├── root-app.yaml  # Parent: points to kubernetes/apps/
│   ├── monitoring.yaml
│   ├── ingress.yaml
│   ├── ai-llm.yaml
│   └── media.yaml
└── workloads/         # Actual manifests (what ArgoCD deploys)
    ├── monitoring/
    │   ├── prometheus/
    │   └── grafana/
    ├── ingress/
    │   └── traefik/
    └── ai-llm/
        ├── ollama/
        └── qdrant/

apps/ is what ArgoCD watches to discover applications. workloads/ is what those applications deploy. They live in the same repo, which keeps everything together and simplifies access control.

The Root App

The parent application is the entry point. You create this one manually (or via CLI), and from then on ArgoCD takes over:

# kubernetes/apps/root-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-app
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/homelab
    targetRevision: main
    path: kubernetes/apps
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

The prune: true flag is important — it means if you delete a child Application manifest from Git, ArgoCD removes the app (and its resources) from the cluster. Without it, stale apps accumulate. selfHeal: true means ArgoCD will revert manual cluster changes back to Git state, which is the whole point.

Child App Manifests

Each child Application points to a workload directory. Here's what a typical one looks like:

# kubernetes/apps/monitoring.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: monitoring
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/homelab
    targetRevision: main
    path: kubernetes/workloads/monitoring
  destination:
    server: https://kubernetes.default.svc
    namespace: monitoring
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

CreateNamespace=true means ArgoCD creates the namespace if it doesn't exist. This matters for bootstrapping — you shouldn't need to pre-create namespaces manually.

The workload directory can contain raw YAML manifests, a kustomization.yaml, or a Helm chart (via the helm source type). I use raw YAML for most things and Helm where upstream charts make it the obvious choice.

Workload Layout Inside Each App

For a typical workload like a deployment + service + ingress, I keep files flat within the app directory:

kubernetes/workloads/ai-llm/ollama/
├── namespace.yaml
├── deployment.yaml
├── service.yaml
├── ingress.yaml
└── pvc.yaml

ArgoCD will discover and apply all YAML files in the path recursively. You don't need a kustomization file unless you want explicit ordering or overlays. For simple workloads, flat files are easier to navigate and less to maintain.

One practical note: if you need a specific apply order (like a Namespace before a Deployment), ArgoCD respects resource sync waves via annotation:

metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "0"  # Lower = applies first

Namespaces get wave 0, CRDs wave 1, everything else default (0 or unset). I only bother with this when ArgoCD actually complains about ordering.

Ingress Health: The "Progressing" Trap

One gotcha I hit: ArgoCD was stuck showing Progressing for any app with an Ingress, even when the workload was fully running. The health check for Ingress resources checks whether the status.loadBalancer.ingress field is populated — and if your ingress controller doesn't write that field, ArgoCD waits forever.

The fix for Traefik is to set the publishedService argument so Traefik copies its LoadBalancer status into Ingress objects:

# In your Traefik Helm values or deployment args
args:
  - --providers.kubernetesIngress.publishedService=traefik/traefik

Once that's set, Ingress objects get a populated status.loadBalancer and ArgoCD's health check passes. This is a common gotcha with any ingress controller that doesn't do this automatically.

Bootstrapping a New Cluster

The real test of GitOps is: "Can I rebuild this from scratch?" With app-of-apps, the bootstrap sequence is short:

# 1. Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 2. Apply the root app
kubectl apply -f kubernetes/apps/root-app.yaml

# 3. Trigger initial sync
argocd app sync root-app

ArgoCD discovers all child apps from the apps/ directory, creates them, and syncs each one. Within a few minutes the cluster converges to Git state. There's no manual "now deploy this, then that" orchestration.

The parts that aren't in Git — cluster-level things like StorageClasses, TLS issuers, sealed secrets — I handle as prerequisites documented separately. But application workloads: fully reproducible.

Dealing With Secrets

The one thing you cannot just commit is secrets. I use SealedSecrets for this. The workflow:

Create a regular Kubernetes Secret manifest
Encrypt it with kubeseal against the cluster's public key
Commit the SealedSecret manifest — it's safe to store in Git
ArgoCD deploys the SealedSecret; the controller decrypts it into a real Secret in-cluster

# Seal a secret for a specific namespace
kubectl create secret generic my-app-secret \
  --from-literal=password=supersecret \
  --dry-run=client -o yaml | \
  kubeseal --namespace my-app --format yaml > sealed-secret.yaml

The sealed output looks like garbage — it's just an encrypted blob. Commit that. The actual credentials never touch your repo.

Back up the SealedSecrets controller's encryption key somewhere safe (I use a password manager). If you lose it and need to rebuild, you'll have to re-seal everything.

Sync Policies: Auto vs. Manual

Not everything should auto-sync. I use automated sync for stateless workloads and anything where drift is actively bad. For databases and stateful workloads I prefer manual sync, so I have to explicitly approve changes before they apply:

syncPolicy:
  # No 'automated' block = manual sync only
  syncOptions:
    - CreateNamespace=true

This way a bad migration doesn't auto-apply at 2am. ArgoCD will still tell you the app is OutOfSync, but it won't act without you.

The general rule I follow: if the app has a PVC or database, manual sync. If it's stateless and idempotent, auto-sync.

Repo Connection and Auth

ArgoCD needs read access to your Git repo. For a private repo, you add credentials via the CLI or UI:

argocd repo add https://github.com/your-org/homelab \
  --username your-user \
  --password your-pat

Use a Personal Access Token with minimal scope — read-only on the repo contents is sufficient. ArgoCD only reads from Git; it doesn't push anything back.

What I'd Do Differently

Start with the structure, not the apps. I initially just pointed ArgoCD at whatever manifests I had lying around, then retrofitted the directory layout. It would have saved me several frustrating syncs to design the apps/+workloads/ split upfront.

Don't skip health checks. ArgoCD has good default health checks for standard Kubernetes resources, but custom CRDs need custom health check scripts. I spent time wondering why a Kustomization from Flux (leftover from an experiment) showed as Healthy when it wasn't — ArgoCD had no idea how to check it. If you're managing CRDs, define health checks for them.

Prune with caution at first. prune: true is powerful and will delete things. Make sure your Git state is actually authoritative before enabling it. I once accidentally pruned a deployment I thought was in Git but wasn't. It came back up in seconds, but it was a good reminder to verify before trusting.

Watch the sync wave ordering for CRDs. If you're deploying operators and their CRDs in the same sync, the CRDs need to apply before the custom resources that use them. Without sync waves, ArgoCD applies everything in parallel and you get no matches for kind errors. Wave 0 for CRDs, wave 1 for the operator deployment, wave 2 for custom resources.

Use the UI for debugging, Git for everything else. The ArgoCD UI is genuinely good for understanding what's drifted and why. But the source of truth is always Git. I make the change in Git, let ArgoCD sync it, and only use the UI to observe — never to make direct changes.

App-of-apps isn't complicated, but it requires committing to the discipline: everything goes through Git, every change is a commit, every rollback is a revert. Once that click happens, the homelab stops feeling like an untracked collection of running things and starts feeling like a system you actually understand. That's worth the initial setup investment.

DEV Community