Sodiq Jimoh

Posted on Mar 30

Why Your KServe InferenceService Won't Become Ready: Four Production Failures and Fixes

#kubernetes #kserve #mlops #devops

A practitioner's account of the errors the KServe getting-started documentation doesn't tell you about — with exact terminal output, root causes, and working Kustomize patches.

This article documents four production failures I encountered while deploying KServe on a local k3d cluster as part of building NeuroScale — a self-service AI inference platform. None of these failures appear in the official KServe getting-started documentation. If you are deploying KServe without Istio, this will save you several hours of debugging.

What I Was Building

NeuroScale is a self-service AI inference platform on Kubernetes. The goal was simple: one InferenceService named sklearn-iris reaches Ready=True and responds to a prediction request.

The install had to be GitOps-managed via ArgoCD — not "I ran some scripts." Getting there took two days and four distinct failures. Here is every one of them.

Stack: k3d (local Kubernetes) · KServe 0.12.1 · ArgoCD · Kourier (no Istio) · Knative Serving

📝 Author's Note: This article was originally documented in the NeuroScale platform repository.
File: docs/REALITY_CHECK_MILESTONE_2_KSERVE_SERVING.md
Repo: github.com/sodiq-code/neuroscale-platform

Failure 1: KServe InferenceService Stuck Not Ready — Istio vs Kourier Ingress Mismatch Causes ReconcileError Loop

Time lost: ~3 hours

Symptom

After applying the KServe installation via ArgoCD (serving-stack app), the InferenceService was created but never became Ready:

$ kubectl -n default get inferenceservice sklearn-iris
NAME           URL   READY   PREV   LATEST   AGE
sklearn-iris         False          100      8m

# READY=False with no URL = KServe controller did not complete ingress setup.
# No Knative Route was created. No external URL was assigned.

Digging In

$ kubectl -n default describe inferenceservice sklearn-iris
...
Status:
  Conditions:
    Message: Failed to reconcile ingress
    Reason:  ReconcileError
    Status:  False
    Type:    IngressReady

$ kubectl -n kserve logs deploy/kserve-controller-manager --tail=50
...
ERROR controller.inferenceservice Failed to reconcile ingress
  {"error": "virtual service not found: sklearn-iris.default.svc.cluster.local"}

The error referenced a virtual service — that is an Istio concept. But we were running Kourier. The KServe controller was attempting to create an Istio VirtualService in a cluster that had no Istio control plane.

Root Cause: Default KServe Ingress Mode Assumes Istio

KServe's default inferenceservice-config ConfigMap expects Istio as the ingress provider. It sets ingressClassName: istio and the key disableIstioVirtualHost defaults to false. When Istio is absent, the controller enters an error loop trying to create resources that will never exist.

Setting disableIstioVirtualHost: true tells KServe to skip Istio and fall back to Knative route objects that Kourier can handle.

Why Kourier instead of Istio: Istio adds ~1 GB of memory overhead. On a local k3d cluster shared with Docker Desktop, Backstage, and the KServe controller, that exhausts available RAM. Kourier's entire footprint is under 200 MB.

The Fix: ConfigMap Patch in serving-stack

# infrastructure/serving-stack/patches/inferenceservice-config-ingress.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: inferenceservice-config
  namespace: kserve
data:
  ingress: |-
    {
      "ingressGateway": "knative-serving/knative-ingress-gateway",
      "ingressDomain": "example.com",
      "ingressClassName": "istio",
      "urlScheme": "http",
      "disableIstioVirtualHost": true,
      "disableIngressCreation": false
    }

After this patch was applied and the KServe controller restarted:

$ kubectl -n default get inferenceservice sklearn-iris
NAME           URL                                       READY   AGE
sklearn-iris   http://sklearn-iris.default.example.com   True    2m

Business impact: This failure cost approximately 3 hours. The KServe documentation does not prominently state that the default configuration requires Istio. The error message "virtual service not found" is Istio-specific vocabulary that only makes sense if you already know Istio is the default — a classic undocumented assumption in infrastructure tooling.

Failure 2: ArgoCD Serving-Stack Sync Fails — Duplicate Knative CRD Exceeds 256 KB Annotation Size Limit

Time lost: ~30 minutes

Symptom

$ kubectl -n argocd get application serving-stack
NAME            SYNC STATUS   HEALTH STATUS
serving-stack   OutOfSync     Degraded

$ kubectl -n argocd describe application serving-stack
...
Message: CustomResourceDefinition "services.serving.knative.dev"
  is invalid: metadata.annotations:
  Too long: may not be more than 262144 bytes

Root Cause

ArgoCD stores kubectl.kubernetes.io/last-applied-configuration in the annotation. For large CRDs, this annotation plus the apply payload exceeds Kubernetes' 256 KB annotation size limit. The Knative CRD is approximately 400 KB as a YAML object.

A rendering overlap compounded the issue: the kserve.yaml bundle already includes its own version of the Knative Serving CRDs, and we were also referencing serving-core.yaml directly. This created two attempts to manage the same CRDs, causing comparison instability.

Fix

# infrastructure/serving-stack/kustomization.yaml

# 1. Use server-side apply to bypass the annotation size limit
commonAnnotations:
  argocd.argoproj.io/sync-options: ServerSideApply=true

# 2. Ignore runtime-mutated fields on Knative CRDs
#    (In ArgoCD Application spec)
ignoreDifferences:
  - group: apiextensions.k8s.io
    kind: CustomResourceDefinition
    name: services.serving.knative.dev
    jsonPointers:
      - /spec/preserveUnknownFields

Business impact: ArgoCD's error says "Too long" but does not tell you which annotation or why it got too long. Debugging requires knowing ArgoCD's internal server-side apply mechanism.

Failure 3: kube-rbac-proxy ImagePullBackOff Blocks KServe Admission Webhook — gcr.io Access Restriction

Time lost: ~1 hour | Cluster-wide impact

Symptom

$ kubectl -n argocd describe application ai-model-alpha
...
Message: admission webhook
  "inferenceservice.kserve-webhook-server.validator.webhook"
  denied the request: no endpoints available for
  service "kserve-webhook-server-service"

$ kubectl -n kserve get pods
NAME                            READY   STATUS
kserve-controller-manager-xxx   1/2     Running   # only 1 of 2 ready

$ kubectl -n kserve describe pod kserve-controller-manager-xxx
  kube-rbac-proxy:
    State:   Waiting
    Reason:  ImagePullBackOff
    Image:   gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1
Events:
  Warning  Failed  kubelet
    Failed to pull image: unexpected status code 403 Forbidden

Root Cause

KServe 0.12.1's kserve-controller-manager Deployment includes a kube-rbac-proxy sidecar from gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1. Google Container Registry restricted access to kubebuilder images in late 2025.

The manager container itself was healthy (1 of 2 ready). But without the sidecar, the webhook server certificate was not being served, so the admission webhook had no healthy endpoints. The alternative registry.k8s.io/kube-rbac-proxy:v0.13.1 did not exist at the new location either.

Fix: Remove the Sidecar via Kustomize Strategic Merge Patch

# infrastructure/serving-stack/patches/
#   kserve-controller-kube-rbac-proxy-image.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kserve-controller-manager
  namespace: kserve
spec:
  template:
    spec:
      containers:
        - name: kube-rbac-proxy
          $patch: delete

After this patch and a re-sync:

$ kubectl -n kserve get pods
NAME                            READY   STATUS
kserve-controller-manager-yyy   1/1     Running   # fixed

$ kubectl -n kserve get endpoints kserve-webhook-server-service
NAME                            ENDPOINTS          AGE
kserve-webhook-server-service   10.42.0.23:9443    45s

Known tradeoff: Removing kube-rbac-proxy disables the Prometheus metrics proxy endpoint for the KServe controller. In production, source a verified replacement image from an accessible registry before deploying.

Business impact: An external registry access change cascaded into a complete admission webhook outage. Any InferenceService creation or update was blocked cluster-wide while the sidecar was failing. This class of failure has no good solution without upstream monitoring of your image dependencies.

Failure 4: Inference Request Returns HTTP 405 — IngressDomain Placeholder Resolves to Public Internet

Time lost: ~1 hour

Symptom

$ kubectl -n default get inferenceservice sklearn-iris \
    -o jsonpath='{.status.url}'
http://sklearn-iris.default.example.com

$ curl -sS \
  -H "Content-Type: application/json" \
  -d '{"instances":[[5.1,3.5,1.4,0.2]]}' \
  http://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict
<html><head><title>405 Not Allowed</title></head>...

# The request hit the public example.com server, not our Kourier gateway.

Root Cause

The ingressDomain in the KServe ConfigMap was set to example.com — a literal placeholder. The generated URL resolves publicly to Cloudflare/IANA servers, not the local cluster.

Additionally, Kourier routes by Host header, not by IP. Just port-forwarding Kourier and hitting 127.0.0.1 does not work without the correct Host header.

Fix: Direct Predictor Pod Port-Forward

Bypass Knative routing and Kourier entirely for local verification:

# Step 1: Get the predictor pod name
kubectl -n default get pods \
  -l serving.knative.dev/revision=sklearn-iris-predictor-00001

# Step 2: Port-forward directly to the predictor container
kubectl -n default port-forward \
  pod/sklearn-iris-predictor-00001-deployment-<hash> 18080:8080

# Step 3: Predict (no Host header, no Kourier, no DNS needed)
curl -sS \
  -H "Content-Type: application/json" \
  -d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \
  http://127.0.0.1:18080/v1/models/sklearn-iris:predict

{"predictions":[0,2]}

For the full Kourier routing path, always pass the Host header:

kubectl -n kourier-system port-forward svc/kourier 18080:80

curl -sS \
  -H 'Host: sklearn-iris-predictor.default.127.0.0.1.sslip.io' \
  -H "Content-Type: application/json" \
  -d '{"instances":[[5.1,3.5,1.4,0.2]]}' \
  http://127.0.0.1:18080/v1/models/sklearn-iris:predict

Business impact: False-negative inference verification. A healthy endpoint looked broken because the test URL resolved to the wrong server. Always verify the complete network path — DNS resolution, ingress routing, pod health — as separate steps rather than assuming a single curl test is conclusive.

What This Proves After the Failures

After working through the above failures, the inference baseline worked:

$ kubectl -n default get inferenceservice sklearn-iris
NAME           URL                                       READY   AGE
sklearn-iris   http://sklearn-iris.default.example.com   True    45m

$ curl -sS \
  -H "Content-Type: application/json" \
  -d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \
  http://127.0.0.1:18080/v1/models/sklearn-iris:predict

{"predictions":[0,2]}

The Istio/Kourier mismatch is the canonical example of why "default configuration" is dangerous in complex systems. KServe's default assumes a specific network topology that is not disclosed in the getting-started docs. Recognizing this class of failure — configuration that works in the tool author's environment but not yours — is a senior platform engineering competency.

What This Setup Does NOT Solve (Known Tradeoffs)

No Istio service mesh: No mTLS between services, no advanced traffic management. Acceptable for local dev; requires a replacement security layer in production.
kube-rbac-proxy removed: Prometheus metrics from the KServe controller are unavailable. Re-add this sidecar from a working registry before any production deployment.
Port-forward for inference: The Host-header workaround is local only. Cloud deployment requires a real ingress with DNS and TLS. On EKS, swap Kourier for an ALB and set ingressDomain to your real domain. See the Cloud Promotion Guide in the repository.

Debugging Commands Reference

Run these in order when an InferenceService will not become Ready.

1 — InferenceService Conditions

kubectl -n default describe inferenceservice sklearn-iris
kubectl -n kserve logs deploy/kserve-controller-manager --tail=50
kubectl -n kserve logs deploy/kserve-controller-manager -c manager --tail=50

2 — Webhook Endpoint Availability

kubectl -n kserve get endpoints kserve-webhook-server-service
kubectl -n kserve describe endpoints kserve-webhook-server-service
kubectl -n default get ksvc
kubectl -n default get route

3 — ConfigMap and Pod Status

kubectl -n kserve get configmap inferenceservice-config -o yaml
kubectl -n kserve get pods -o wide
kubectl -n kserve describe pod <pod-name>

The One Thing to Remember

KServe's default configuration assumes Istio is installed. This assumption is not prominently stated in the getting-started documentation. Every engineer running KServe on k3d, k3s, GKE Autopilot, or any non-Istio cluster will hit ReconcileError and see error messages referencing "virtual services" — an Istio concept — with no obvious resolution path.

The fix is one ConfigMap patch. It takes 30 seconds to apply. Finding it took three hours.

The kube-rbac-proxy 403 from gcr.io is an external dependency failure that silently kills your admission webhook cluster-wide. The $patch: delete Kustomize strategy is the fastest recovery path when no alternative registry image is available.

Full platform source — all six Reality Check documents, Backstage Golden Path, Kyverno policy enforcement, cost attribution, and a Cloud Promotion Guide to EKS/GKE: Check out the full NeuroScale repo here.

DEV Community

Why Your KServe InferenceService Won't Become Ready: Four Production Failures and Fixes

What I Was Building

Failure 1: KServe InferenceService Stuck Not Ready — Istio vs Kourier Ingress Mismatch Causes ReconcileError Loop

Symptom

Digging In

Root Cause: Default KServe Ingress Mode Assumes Istio

The Fix: ConfigMap Patch in serving-stack

Failure 2: ArgoCD Serving-Stack Sync Fails — Duplicate Knative CRD Exceeds 256 KB Annotation Size Limit

Symptom

Root Cause

Fix

Failure 3: kube-rbac-proxy ImagePullBackOff Blocks KServe Admission Webhook — gcr.io Access Restriction

Symptom

Root Cause

Fix: Remove the Sidecar via Kustomize Strategic Merge Patch

Failure 4: Inference Request Returns HTTP 405 — IngressDomain Placeholder Resolves to Public Internet

Symptom

Root Cause

Fix: Direct Predictor Pod Port-Forward

What This Proves After the Failures

What This Setup Does NOT Solve (Known Tradeoffs)

Debugging Commands Reference

1 — InferenceService Conditions

2 — Webhook Endpoint Availability

3 — ConfigMap and Pod Status

The One Thing to Remember

See Also

Top comments (0)