Sodiq Jimoh

Posted on Apr 14

Your Kyverno CI Is Lying to You: Why kyverno-cli Exits 0 on Policy Violations

#kubernetes #kyverno #cicd #devops

This article is about a CI system that looked healthy for two weeks while
enforcing absolutely nothing.

The symptom was simple: a PR with a deliberately non-compliant Kubernetes
manifest was passing CI. The Kyverno policy check step showed green.
The manifest was missing a required label. Kyverno should have blocked it.
It did not.

This is the story of why, and the exact fix.

This is part of a series on building a production-hardened AI inference
platform on Kubernetes:

Project repo:
github.com/sodiq-code/neuroscale-platform

The context: what I was enforcing

The NeuroScale platform requires every InferenceService and Deployment
in the default namespace to carry two labels:

owner — which team owns the workload
cost-center — which budget the resource consumption is charged to

These labels feed directly into OpenCost for cost attribution. Without them,
workloads appear as uncategorised spend. With Kyverno admission policies
enforcing them at the cluster level, every resource is guaranteed to carry
cost attribution metadata.

The Kyverno ClusterPolicy looks like this:

# infrastructure/kyverno/policies/
#   require-standard-labels-inferenceservice.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-standard-labels-inferenceservice
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-owner-and-cost-center-on-isvc
      match:
        any:
          - resources:
              kinds:
                - InferenceService
      validate:
        message: >
          InferenceService resources must set
          metadata.labels.owner and metadata.labels.cost-center.
        pattern:
          metadata:
            labels:
              owner: "?*"
              cost-center: "?*"

Admission enforcement works. Apply an InferenceService without those labels
and Kyverno blocks it at the API server:

$ kubectl apply -f bad-model.yaml
Error from server: admission webhook
  "clusterpolice.kyverno.svc" denied the request:
  resource InferenceService/default/bad-model was blocked
  due to the following policies
  require-standard-labels-inferenceservice:
    check-owner-and-cost-center-on-isvc:
      'validation error: InferenceService resources must set
      metadata.labels.owner and metadata.labels.cost-center.'

That part worked correctly. The CI part did not.

The false-green: what it looked like

The CI workflow ran kyverno-cli against every changed manifest in apps/
on every pull request. The intent was to catch non-compliant manifests before
merge — shift-left enforcement before the manifest ever reached the cluster.

The original CI command:

docker run --rm -v "$PWD:/work" -w /work \
  ghcr.io/kyverno/kyverno-cli:v1.12.5 \
  apply infrastructure/kyverno/policies/*.yaml \
  --resource "${app_files[@]}"

A test PR was created with a manifest that deliberately lacked cost-center:

# apps/test-bad-model/inference-service.yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: test-bad-model
  namespace: default
  labels:
    owner: platform-team
    # cost-center intentionally missing

Expected result: CI fails.
Actual result: CI passed.

$ git push origin feature/test-bad-policy
# ... CI runs ...
# Result: validate-policies-against-app-manifests: ✅ PASSED

The violation was printed to stdout. The job still showed green.

Root Cause Part 1: kyverno-cli apply exits 0 on violations

This is the core issue. kyverno-cli apply in the version used (v1.12.x)
exits with code 0 when it finds policy violations. It prints them to
stdout, but returns a success exit code.

You can verify this directly:

docker run --rm -v "$PWD:/work" -w /work \
  ghcr.io/kyverno/kyverno-cli:v1.12.5 \
  apply infrastructure/kyverno/policies/*.yaml \
  --resource apps/test-bad-model/inference-service.yaml

# Output:
# PASS: 0, FAIL: 1, WARN: 0, ERROR: 0, SKIP: 0
#
# policy require-standard-labels-inferenceservice ->
#   resource default/InferenceService/test-bad-model
#   FAIL: check-owner-and-cost-center-on-isvc
#
echo $?
0   # <-- exits 0 despite the FAIL

The violation is visible in stdout. The exit code is 0. Any CI step that
only checks the exit code will report success.

Root Cause Part 2: $? captures tee, not kyverno

The CI command piped output through tee to capture it for logging:

docker run ... kyverno-cli apply ... \
  2>&1 | tee /tmp/kyverno-output.txt

Even if kyverno-cli had exited non-zero, $? in bash captures the exit
code of the last command in the pipe — which is tee. tee always
exits 0 if it can write to the file.

This means two separate problems were stacked:

kyverno-cli apply exits 0 on violations (kyverno behavior)
$? captures tee exit code, not kyverno exit code (bash pipe behavior)

Either problem alone would have caused the false-green.
Together they made the enforcement completely invisible.

The fix: dual check with $PIPESTATUS[0]

${PIPESTATUS[0]} captures the exit code of the first command in a
pipe, regardless of what the subsequent commands return. Combined with
stdout parsing for violation markers, this creates a reliable enforcement
check.

set +e
docker run --rm -v "$PWD:/work" -w /work \
  ghcr.io/kyverno/kyverno-cli:v1.12.5 \
  apply infrastructure/kyverno/policies/*.yaml \
  --resource "${app_files[@]}" \
  2>&1 | tee /tmp/kyverno-output.txt
kyverno_exit="${PIPESTATUS[0]}"
set -e

if [ "${kyverno_exit}" -ne 0 ] \
    || grep -qE "^FAIL" /tmp/kyverno-output.txt \
    || grep -qE "fail: [1-9][0-9]*" /tmp/kyverno-output.txt; then
  echo "Kyverno policy violations detected. Failing CI."
  exit 1
fi

Why two checks instead of one:

The ${PIPESTATUS[0]} check handles cases where kyverno-cli itself
exits non-zero — which may happen in future versions or on error conditions.
The stdout grep checks handle the current v1.12.x behavior where violations
print FAIL to stdout but exit 0. Together they cover both current behavior
and future behavior changes.

Why set +e before the command:

Without set +e, a non-zero exit from kyverno would immediately abort the
script before ${PIPESTATUS[0]} could be captured. set +e disables
errexit temporarily so we can capture and evaluate the exit code explicitly.

Verification: proving the fix works

After the fix, the same test PR with the missing cost-center label now
fails CI correctly:

$ git push origin feature/test-non-compliant

# CI output:
validate-policies-against-app-manifests
  Running kyverno policy check...

  PASS: 0, FAIL: 1, WARN: 0, ERROR: 0, SKIP: 0
  policy require-standard-labels-inferenceservice ->
    resource default/InferenceService/test-bad-model
    FAIL: check-owner-and-cost-center-on-isvc

  Kyverno policy violations detected. Failing CI.

# Result: validate-policies-against-app-manifests: ❌ FAILED

A compliant manifest with both labels passes:

$ kubectl apply -f apps/demo-iris-2/inference-service.yaml
# CI result: validate-policies-against-app-manifests: ✅ PASSED

The complete GitHub Actions workflow step

Here is the full implementation used in the NeuroScale platform:

# .github/workflows/guardrails-checks.yaml
- name: Validate policies against app manifests
  run: |
    app_files=()
    while IFS= read -r -d '' f; do
      app_files+=("--resource" "$f")
    done < <(find apps/ -name "*.yaml" -print0)

    if [ ${#app_files[@]} -eq 0 ]; then
      echo "No app manifests found. Skipping policy check."
      exit 0
    fi

    set +e
    docker run --rm -v "$PWD:/work" -w /work \
      ghcr.io/kyverno/kyverno-cli:v1.12.5 \
      apply infrastructure/kyverno/policies/*.yaml \
      "${app_files[@]}" \
      2>&1 | tee /tmp/kyverno-output.txt
    kyverno_exit="${PIPESTATUS[0]}"
    set -e

    echo "--- Kyverno output ---"
    cat /tmp/kyverno-output.txt
    echo "--- Exit code: ${kyverno_exit} ---"

    if [ "${kyverno_exit}" -ne 0 ] \
        || grep -qE "^FAIL" /tmp/kyverno-output.txt \
        || grep -qE "fail: [1-9][0-9]*" /tmp/kyverno-output.txt; then
      echo "Kyverno policy violations detected. Failing CI."
      exit 1
    fi

    echo "All manifests passed policy checks."

Why this matters beyond a single platform

The kyverno-cli apply exit code behavior is not a bug — it is documented
behavior for the apply subcommand. But it is not prominently surfaced in
the getting-started documentation, and most CI examples in the wild use
the exit code check alone.

If your team is using Kyverno for compliance or security enforcement in CI,
and your CI step looks like this:

kyverno apply policies/ --resource manifests/
if [ $? -ne 0 ]; then
  echo "Policy violation detected"
  exit 1
fi

Your enforcement is silently not enforcing. The admission webhook at the
cluster level is still blocking violations — but your shift-left CI gate
is not. Developers will only discover policy violations after merging and
watching ArgoCD fail, not before.

The distinction between "guardrails exist" and "guardrails enforce" is
exactly what separates platform engineering from platform theater.

The two-layer enforcement model

The fix is not just about the CI step. The NeuroScale platform uses two
enforcement layers that work together:

Layer 1 — PR time (shift-left): kyverno-cli in CI catches violations
before merge. Developers get fast feedback without needing a running cluster.

Layer 2 — Admission time (shift-down): Kyverno admission webhook blocks
non-compliant resources at the Kubernetes API server. Even if CI is bypassed
or misconfigured, nothing non-compliant reaches the cluster.

PR opened
    ↓
CI: kyverno-cli apply + $PIPESTATUS[0] check
    ↓ (blocks here if non-compliant)
PR merged
    ↓
ArgoCD sync
    ↓
Kyverno admission webhook
    ↓ (blocks here as second layer)
Resource created in cluster

Layer 1 gives fast developer feedback. Layer 2 is the safety net.
Both are required. Neither alone is sufficient.

Debugging Commands Reference

# Test a policy manually against a manifest
docker run --rm -v "$PWD:/work" -w /work \
  ghcr.io/kyverno/kyverno-cli:v1.12.5 \
  apply infrastructure/kyverno/policies/require-standard-labels-inferenceservice.yaml \
  --resource apps/demo-iris-2/inference-service.yaml

# Check Kyverno admission webhook registrations
kubectl get validatingwebhookconfigurations | grep kyverno
kubectl get mutatingwebhookconfigurations | grep kyverno

# Verify Kyverno pods are healthy
kubectl -n kyverno get pods
kubectl -n kyverno get endpoints kyverno-svc

# List all installed ClusterPolicies and their enforcement mode
kubectl get clusterpolicies -o wide

# Review Kyverno admission decisions in controller logs
kubectl -n kyverno logs deploy/kyverno --tail=50 | \
  grep -i "admit\|deny\|block"

What I Would Add Next

A kyverno test subcommand integration in CI for cases that require explicit pass/fail test fixtures rather than live policy simulation
Background scan results surfaced as PR comments using the Kyverno policy report API
Separate policy validation for Deployment and InferenceService resources to give more specific failure messages per resource type

DEV Community