A practitioner's account of the errors the KServe getting-started documentation doesn't tell you about — with exact terminal output, root causes, and working Kustomize patches.
This article documents four production failures I encountered while deploying KServe on a local k3d cluster as part of building NeuroScale — a self-service AI inference platform. None of these failures appear in the official KServe getting-started documentation. If you are deploying KServe without Istio, this will save you several hours of debugging.
What I Was Building
NeuroScale is a self-service AI inference platform on Kubernetes. The goal was simple: one InferenceService named sklearn-iris reaches Ready=True and responds to a prediction request.
The install had to be GitOps-managed via ArgoCD — not "I ran some scripts." Getting there took two days and four distinct failures. Here is every one of them.
Stack: k3d (local Kubernetes) · KServe 0.12.1 · ArgoCD · Kourier (no Istio) · Knative Serving
📝 Author's Note: This article was originally documented in the NeuroScale platform repository.
File:docs/REALITY_CHECK_MILESTONE_2_KSERVE_SERVING.md
Repo: github.com/sodiq-code/neuroscale-platform
Failure 1: KServe InferenceService Stuck Not Ready — Istio vs Kourier Ingress Mismatch Causes ReconcileError Loop
Time lost: ~3 hours
Symptom
After applying the KServe installation via ArgoCD (serving-stack app), the InferenceService was created but never became Ready:
$ kubectl -n default get inferenceservice sklearn-iris
NAME URL READY PREV LATEST AGE
sklearn-iris False 100 8m
# READY=False with no URL = KServe controller did not complete ingress setup.
# No Knative Route was created. No external URL was assigned.
Digging In
$ kubectl -n default describe inferenceservice sklearn-iris
...
Status:
Conditions:
Message: Failed to reconcile ingress
Reason: ReconcileError
Status: False
Type: IngressReady
$ kubectl -n kserve logs deploy/kserve-controller-manager --tail=50
...
ERROR controller.inferenceservice Failed to reconcile ingress
{"error": "virtual service not found: sklearn-iris.default.svc.cluster.local"}
The error referenced a virtual service — that is an Istio concept. But we were running Kourier. The KServe controller was attempting to create an Istio VirtualService in a cluster that had no Istio control plane.
Root Cause: Default KServe Ingress Mode Assumes Istio
KServe's default inferenceservice-config ConfigMap expects Istio as the ingress provider. It sets ingressClassName: istio and the key disableIstioVirtualHost defaults to false. When Istio is absent, the controller enters an error loop trying to create resources that will never exist.
Setting disableIstioVirtualHost: true tells KServe to skip Istio and fall back to Knative route objects that Kourier can handle.
Why Kourier instead of Istio: Istio adds ~1 GB of memory overhead. On a local k3d cluster shared with Docker Desktop, Backstage, and the KServe controller, that exhausts available RAM. Kourier's entire footprint is under 200 MB.
The Fix: ConfigMap Patch in serving-stack
# infrastructure/serving-stack/patches/inferenceservice-config-ingress.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: inferenceservice-config
namespace: kserve
data:
ingress: |-
{
"ingressGateway": "knative-serving/knative-ingress-gateway",
"ingressDomain": "example.com",
"ingressClassName": "istio",
"urlScheme": "http",
"disableIstioVirtualHost": true,
"disableIngressCreation": false
}
After this patch was applied and the KServe controller restarted:
$ kubectl -n default get inferenceservice sklearn-iris
NAME URL READY AGE
sklearn-iris http://sklearn-iris.default.example.com True 2m
Business impact: This failure cost approximately 3 hours. The KServe documentation does not prominently state that the default configuration requires Istio. The error message "virtual service not found" is Istio-specific vocabulary that only makes sense if you already know Istio is the default — a classic undocumented assumption in infrastructure tooling.
Failure 2: ArgoCD Serving-Stack Sync Fails — Duplicate Knative CRD Exceeds 256 KB Annotation Size Limit
Time lost: ~30 minutes
Symptom
$ kubectl -n argocd get application serving-stack
NAME SYNC STATUS HEALTH STATUS
serving-stack OutOfSync Degraded
$ kubectl -n argocd describe application serving-stack
...
Message: CustomResourceDefinition "services.serving.knative.dev"
is invalid: metadata.annotations:
Too long: may not be more than 262144 bytes
Root Cause
ArgoCD stores kubectl.kubernetes.io/last-applied-configuration in the annotation. For large CRDs, this annotation plus the apply payload exceeds Kubernetes' 256 KB annotation size limit. The Knative CRD is approximately 400 KB as a YAML object.
A rendering overlap compounded the issue: the kserve.yaml bundle already includes its own version of the Knative Serving CRDs, and we were also referencing serving-core.yaml directly. This created two attempts to manage the same CRDs, causing comparison instability.
Fix
# infrastructure/serving-stack/kustomization.yaml
# 1. Use server-side apply to bypass the annotation size limit
commonAnnotations:
argocd.argoproj.io/sync-options: ServerSideApply=true
# 2. Ignore runtime-mutated fields on Knative CRDs
# (In ArgoCD Application spec)
ignoreDifferences:
- group: apiextensions.k8s.io
kind: CustomResourceDefinition
name: services.serving.knative.dev
jsonPointers:
- /spec/preserveUnknownFields
Business impact: ArgoCD's error says "Too long" but does not tell you which annotation or why it got too long. Debugging requires knowing ArgoCD's internal server-side apply mechanism.
Failure 3: kube-rbac-proxy ImagePullBackOff Blocks KServe Admission Webhook — gcr.io Access Restriction
Time lost: ~1 hour | Cluster-wide impact
Symptom
$ kubectl -n argocd describe application ai-model-alpha
...
Message: admission webhook
"inferenceservice.kserve-webhook-server.validator.webhook"
denied the request: no endpoints available for
service "kserve-webhook-server-service"
$ kubectl -n kserve get pods
NAME READY STATUS
kserve-controller-manager-xxx 1/2 Running # only 1 of 2 ready
$ kubectl -n kserve describe pod kserve-controller-manager-xxx
kube-rbac-proxy:
State: Waiting
Reason: ImagePullBackOff
Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1
Events:
Warning Failed kubelet
Failed to pull image: unexpected status code 403 Forbidden
Root Cause
KServe 0.12.1's kserve-controller-manager Deployment includes a kube-rbac-proxy sidecar from gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1. Google Container Registry restricted access to kubebuilder images in late 2025.
The manager container itself was healthy (1 of 2 ready). But without the sidecar, the webhook server certificate was not being served, so the admission webhook had no healthy endpoints. The alternative registry.k8s.io/kube-rbac-proxy:v0.13.1 did not exist at the new location either.
Fix: Remove the Sidecar via Kustomize Strategic Merge Patch
# infrastructure/serving-stack/patches/
# kserve-controller-kube-rbac-proxy-image.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kserve-controller-manager
namespace: kserve
spec:
template:
spec:
containers:
- name: kube-rbac-proxy
$patch: delete
After this patch and a re-sync:
$ kubectl -n kserve get pods
NAME READY STATUS
kserve-controller-manager-yyy 1/1 Running # fixed
$ kubectl -n kserve get endpoints kserve-webhook-server-service
NAME ENDPOINTS AGE
kserve-webhook-server-service 10.42.0.23:9443 45s
Known tradeoff: Removing
kube-rbac-proxydisables the Prometheus metrics proxy endpoint for the KServe controller. In production, source a verified replacement image from an accessible registry before deploying.Business impact: An external registry access change cascaded into a complete admission webhook outage. Any InferenceService creation or update was blocked cluster-wide while the sidecar was failing. This class of failure has no good solution without upstream monitoring of your image dependencies.
Failure 4: Inference Request Returns HTTP 405 — IngressDomain Placeholder Resolves to Public Internet
Time lost: ~1 hour
Symptom
$ kubectl -n default get inferenceservice sklearn-iris \
-o jsonpath='{.status.url}'
http://sklearn-iris.default.example.com
$ curl -sS \
-H "Content-Type: application/json" \
-d '{"instances":[[5.1,3.5,1.4,0.2]]}' \
http://sklearn-iris.default.example.com/v1/models/sklearn-iris:predict
<html><head><title>405 Not Allowed</title></head>...
# The request hit the public example.com server, not our Kourier gateway.
Root Cause
The ingressDomain in the KServe ConfigMap was set to example.com — a literal placeholder. The generated URL resolves publicly to Cloudflare/IANA servers, not the local cluster.
Additionally, Kourier routes by Host header, not by IP. Just port-forwarding Kourier and hitting 127.0.0.1 does not work without the correct Host header.
Fix: Direct Predictor Pod Port-Forward
Bypass Knative routing and Kourier entirely for local verification:
# Step 1: Get the predictor pod name
kubectl -n default get pods \
-l serving.knative.dev/revision=sklearn-iris-predictor-00001
# Step 2: Port-forward directly to the predictor container
kubectl -n default port-forward \
pod/sklearn-iris-predictor-00001-deployment-<hash> 18080:8080
# Step 3: Predict (no Host header, no Kourier, no DNS needed)
curl -sS \
-H "Content-Type: application/json" \
-d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \
http://127.0.0.1:18080/v1/models/sklearn-iris:predict
{"predictions":[0,2]}
For the full Kourier routing path, always pass the Host header:
kubectl -n kourier-system port-forward svc/kourier 18080:80
curl -sS \
-H 'Host: sklearn-iris-predictor.default.127.0.0.1.sslip.io' \
-H "Content-Type: application/json" \
-d '{"instances":[[5.1,3.5,1.4,0.2]]}' \
http://127.0.0.1:18080/v1/models/sklearn-iris:predict
Business impact: False-negative inference verification. A healthy endpoint looked broken because the test URL resolved to the wrong server. Always verify the complete network path — DNS resolution, ingress routing, pod health — as separate steps rather than assuming a single curl test is conclusive.
What This Proves After the Failures
After working through the above failures, the inference baseline worked:
$ kubectl -n default get inferenceservice sklearn-iris
NAME URL READY AGE
sklearn-iris http://sklearn-iris.default.example.com True 45m
$ curl -sS \
-H "Content-Type: application/json" \
-d '{"instances":[[5.1,3.5,1.4,0.2],[6.2,3.4,5.4,2.3]]}' \
http://127.0.0.1:18080/v1/models/sklearn-iris:predict
{"predictions":[0,2]}
The Istio/Kourier mismatch is the canonical example of why "default configuration" is dangerous in complex systems. KServe's default assumes a specific network topology that is not disclosed in the getting-started docs. Recognizing this class of failure — configuration that works in the tool author's environment but not yours — is a senior platform engineering competency.
What This Setup Does NOT Solve (Known Tradeoffs)
- No Istio service mesh: No mTLS between services, no advanced traffic management. Acceptable for local dev; requires a replacement security layer in production.
- kube-rbac-proxy removed: Prometheus metrics from the KServe controller are unavailable. Re-add this sidecar from a working registry before any production deployment.
-
Port-forward for inference: The Host-header workaround is local only. Cloud deployment requires a real ingress with DNS and TLS. On EKS, swap Kourier for an ALB and set
ingressDomainto your real domain. See the Cloud Promotion Guide in the repository.
Debugging Commands Reference
Run these in order when an InferenceService will not become Ready.
1 — InferenceService Conditions
kubectl -n default describe inferenceservice sklearn-iris
kubectl -n kserve logs deploy/kserve-controller-manager --tail=50
kubectl -n kserve logs deploy/kserve-controller-manager -c manager --tail=50
2 — Webhook Endpoint Availability
kubectl -n kserve get endpoints kserve-webhook-server-service
kubectl -n kserve describe endpoints kserve-webhook-server-service
kubectl -n default get ksvc
kubectl -n default get route
3 — ConfigMap and Pod Status
kubectl -n kserve get configmap inferenceservice-config -o yaml
kubectl -n kserve get pods -o wide
kubectl -n kserve describe pod <pod-name>
The One Thing to Remember
KServe's default configuration assumes Istio is installed. This assumption is not prominently stated in the getting-started documentation. Every engineer running KServe on k3d, k3s, GKE Autopilot, or any non-Istio cluster will hit ReconcileError and see error messages referencing "virtual services" — an Istio concept — with no obvious resolution path.
The fix is one ConfigMap patch. It takes 30 seconds to apply. Finding it took three hours.
The kube-rbac-proxy 403 from gcr.io is an external dependency failure that silently kills your admission webhook cluster-wide. The $patch: delete Kustomize strategy is the fastest recovery path when no alternative registry image is available.
Full platform source — all six Reality Check documents, Backstage Golden Path, Kyverno policy enforcement, cost attribution, and a Cloud Promotion Guide to EKS/GKE: Check out the full NeuroScale repo here.
See Also
-
infrastructure/serving-stack/patches/inferenceservice-config-ingress.yaml— Kourier config patch -
infrastructure/serving-stack/patches/kserve-controller-kube-rbac-proxy-image.yaml— sidecar removal patch -
infrastructure/kserve/sklearn-runtime.yaml— ClusterServingRuntime definition -
docs/CLOUD_PROMOTION_GUIDE.md— how to replace Kourier with ALB/NGINX on EKS/GKE -
docs/REALITY_CHECK_MILESTONE_3_GOLDEN_PATH.md— nine Backstage failures documented at the same depth -
docs/REALITY_CHECK_MILESTONE_4_GUARDRAILS.md— how kyverno-cli exits 0 on violations and why$PIPESTATUS[0]matters
Jimoh Sodiq Bolaji | Platform Engineer | Technical Content Engineer | Abuja, Nigeria | NeuroScale Platform
Top comments (0)