DEV Community

Guatu
Guatu

Posted on • Originally published at guatulabs.dev

Stop Merging Broken YAML: Kubernetes Manifest Validation in CI

Pushing a broken manifest to your main branch is a rite of passage, but it's one that becomes significantly more painful when you're running a GitOps workflow with ArgoCD. I've spent far too many late nights staring at a "Sync Failed" status in ArgoCD, only to realize I had a typo in a Traefik IngressRoute or a missing resource limit that Kyverno was blocking. The problem isn't just the error itself; it's the feedback loop. If the error only surfaces during deployment, your CI pipeline has failed its primary job.

The goal is to move validation as far left as possible. I started integrating kubeconform into my GitHub Actions workflow to catch structural errors—like invalid API versions or malike fields—before the code even reaches a pull request review. However, structural validation is only half the battle. You also have to deal with policy enforcement. I recently ran into a situation where a Kyverno policy enforcing resource limits on all Jobs was breaking my CloudNativePG (CNPG) deployments. The CNPG operator creates Jobs that don't always follow the standard resource pattern, and because the policy was too broad, the cluster refused to provision the primary.

The fix involves two parts: using kubeconform for schema validation in CI and using targeted exclusions in your Kyverno policies. For the CI side, you don't need a complex setup. A simple action step can scan your entire manifests directory.

# GitHub Action snippet for manifest validation
jobs:
  validate-manifests:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Validate Kubernetes manifests
        uses: yannh/kubernetes-manifest-validate@v1.11
        with:
          manifests: |
            kubernetes/workloads/**/*.yaml
            kubernetes/infrastructure/**/*.yaml
Enter fullscreen mode Exit fullscreen mode

On the cluster side, when you have a legitimate reason to bypass a policy—like the CNPG example—don't just disable the policy globally. Use labels to create an exclusion scope. This keeps your GitOps for Homelabs workflow clean without sacrificing security for the rest of your workloads.

apiVersion: kyverno.io/v1
kind: Policy
metadata:
  name: require-resource-limits
spec:
  rules:
    - name: enforce-limits-on-jobs
      match:
        resources:
          kinds:
            - Job
      # Exclude CNPG clusters so the operator can manage its own jobs
      exclude:
        resources:
          labels:
            cnpg.io/cluster: "*"
      validate:
        message: "All containers must have resource limits defined."
        pattern:
          spec:
            template:
              spec:
                containers:
                  - resources:
                      limits:
                        cpu: "?*"
                        memory: "?*"
Enter fullscreen mode Exit fullscreen mode

Validating at the PR stage catches the "dumb" mistakes, while smart policy exclusions prevent the "smart" tools from breaking your legitimate infrastructure.

Top comments (0)