Alister Baroi for Tigera Inc

Posted on Apr 2 • Originally published at tigera.io on Jan 22

Ingress Security for AI Workloads in Kubernetes: Protecting AI Endpoints with WAF

#technicalblog #aiworkloads #bestpractices

AI Workloads Have a New Front Door

For years, AI and machine learning workloads lived in the lab. They ran as internal experiments, batch jobs in isolated clusters, or offline data pipelines. Security focused on internal access controls and protecting the data perimeter.

That model no longer holds.

Today, AI models are increasingly part of production traffic, which is driving new challenges around securing AI workloads in Kubernetes. Whether serving a large language model for a customer-facing chatbot or a computer vision model for real-time analysis, these models are exposed through APIs, typically REST or gRPC, running as microservices in Kubernetes.

From a platform engineering perspective, these AI inference endpoints are now Tier 1 services. They sit alongside login APIs and payment gateways in terms of criticality, but they introduce a different and more expensive risk profile than traditional web applications. For AI inference endpoints, ingress security increasingly means Layer 7 inspection and WAF (Web Application Firewall) level controls at the cluster edge. By analyzing the full request payload, a WAF can detect and block abusive or malicious traffic before it ever reaches expensive GPU resources or sensitive data. This sets the stage for protecting AI workloads from both operational and security risks.

Why the Stakes Have Changed for Platform Teams

Exposing AI workloads to the internet creates a direct path to some of your most sensitive and costly infrastructure. For platform teams, this shift introduces several new challenges:

Resource Risk

GPU Resource Exhaustion

Inference requests drive sustained GPU usage. Lack of edge protection leads to cluster instability and massive budget overruns.

Security Risk

Data Integrity & Exposure

AI models often pull from internal knowledge bases. Unprotected endpoints expose proprietary RAG pipelines to the public internet.

Operational Risk

System Fragility

Heavy compute loads mean ingress latency causes cascading failures and timeouts across the entire microservices architecture.

If you are running AI workloads in Kubernetes, ingress is no longer just a traffic router. It is a critical control point for securing AI services before costly computation begins. To protect the model, you have to protect the “front door.”

Why AI Inference Changes the Ingress Security Model

Traditional web architectures assume HTTP requests are cheap. A typical request consumes minimal CPU and memory, and the cost of serving it is predictable.

AI inference breaks this assumption.

Every request to an AI endpoint carries a measurable and often significant cost. Attackers no longer need to crash your service to cause damage. They simply need to use it.

The High Cost of the “Successful” Request

In a standard web app, an attacker usually has to find a vulnerability or overwhelm your bandwidth to cause damage. With AI workloads, attackers don’t need to crash your service to hurt you—they just need to use it.

This shift introduces three primary threat categories.

Cost-based abuse Attackers flood inference endpoints with complex or high-token requests that keep GPUs saturated. The result is not downtime, but an unexpected and often massive cloud bill.
Prompt-based manipulation AI endpoints are vulnerable to prompt injection attacks designed to bypass safeguards, extract sensitive information, or force unintended behavior. If ingress does not inspect request payloads, enforcement is left entirely to the model itself.
Behavioral data exposure Through probing techniques such as model inversion, attackers can infer details about training data or model behavior using requests that appear valid. Traditional firewalls are not designed to detect this class of abuse.

To protect GPU resources and sensitive data, platform teams need ingress security capable of understanding and controlling Layer 7 traffic before inference begins.

AI-Specific Ingress Threats Platform Teams Are Seeing

As AI moves into production, many platform teams are discovering gaps in existing security playbooks. The threats targeting AI workloads often bypass traditional Layer 3 and Layer 4 controls. Here are some AI-specific threats currently landing on the desks of platform teams:

Resource Exhaustion and LLM Jacking

In a traditional environment, a DDoS attack is meant to take you offline. In the AI world, we are seeing a shift toward LLM Jacking.

The Tactic Attackers send large, complex prompts that keep models active and GPUs fully utilized.

The Challenge Requests often appear legitimate and fall below traditional rate-limiting thresholds.

The Risk Costs accumulate silently. Billing alerts may arrive long after budgets have been exceeded.

Prompt Injection & Input Abuse

Prompt injection has become the AI equivalent of SQL injection, targeting the logic of the model itself.

The Challenge This is a Layer 7 problem. Ingress controllers that only inspect headers cannot detect malicious prompt content.

The Risk Enforcement is left to model safeguards that are often easy to bypass once the request hits the engine.

Data Exposure and the AI Pipeline

Modern AI systems are pipelines, not standalone models, frequently connecting to vector databases and RAG systems.

The Threat Classic Layer 7 vulnerabilities (SQLi or cache poisoning) can corrupt the data feeding the model.

The Risk Compromised pipelines lead to untrustworthy output, creating massive security and reputational risk.

The Ingress Blind Spot in Kubernetes Today

The shift to AI workloads has exposed a massive blind spot in this “route and pass through” design of Ingress controllers.

Most Kubernetes ingress controllers were designed for traditional web traffic. Their primary responsibilities are TLS termination, basic authentication, and routing based on hostnames or paths.

This design prioritizes speed and simplicity, not deep inspection. As AI workloads stress this model, three limitations become clear.

Limited visibility into abnormal usage

Ingress can see traffic volume but cannot distinguish between legitimate demand and cost-based abuse.

Coarse rate limiting

IP-based or request-count limits are insufficient for AI workloads that require controls based on request type or complexity.

All-or-nothing enforcement

Without Layer 7 inspection, teams are forced to either allow traffic through or block endpoints entirely.

AI workloads have exceeded the limits of traditional ingress security. Protecting AI inference requires a gateway that can inspect, understand, and act on application-level traffic before compute resources are consumed.

Securing AI Ingress with Calico Ingress Gateway

To solve the unique challenges of AI production traffic, platform teams need more than a simple proxy; they need a context aware gateway that can filter and route traffic based on the contents of a packet in addition to IPs and host headers.

Calico Ingress Gateway is built on the industry-standard Envoy proxy and the modern Kubernetes Gateway API. By moving security logic to the cluster edge, it provides a centralized, Kubernetes-native control point to protect your AI services.

Integrated Web Application Firewall

The most powerful tool for AI workloads is the integrated WAF, which operates at Layer 7 to provide deep packet inspection:

Block known attack patterns targeting RAG pipelines and backend services
Drop abusive traffic before it consumes GPU resources
Enforce security policies consistently across inference endpoints

Identity-Aware Access Control

AI endpoints are often exposed with minimal authentication. This creates an easy target for abuse.

Calico enables authentication and authorization at the ingress layer so only trusted users and services can trigger inference.

Enforce identity-based access at the edge
Apply per-user and per-service policies
Prevent anonymous traffic from consuming expensive compute resources

Fine-Grained Rate Limiting for AI Workloads

AI workloads require more nuanced controls than traditional services. Calico Ingress Gateway supports multi-dimensional rate limiting based on attributes such as:

API keys or headers
Request type or endpoint
Source identity within Kubernetes

This allows platform teams to protect high-cost operations without disrupting legitimate traffic.

Real-Time Observability and Security as Code

Securing AI workloads requires visibility. You cannot secure what you cannot see. When an AI service experiences high latency, you need to know immediately: Is the model slow, or are we being probed by a bot?

Calico provides detailed flow logs that go beyond standard HTTP access logs. Platform teams can identify high-cost users, detect abnormal traffic patterns, and correlate ingress behavior with GPU utilization metrics.

Security policies are managed as Kubernetes custom resources, enabling a GitOps workflow. Rate limits, WAF rules, and access controls can be versioned, reviewed, and deployed using existing CI/CD pipelines.

Security as an AI Enabler

AI inference endpoints are no longer just another microservice. They are high-value, high-risk entry points into your platform.

Securing them requires a shift in how ingress is designed and operated. For platform teams, this means focusing on:

Cost control by preventing resource abuse
Data protection through Layer 7 inspection
Platform reliability by isolating AI workloads from the rest of the cluster

Secure Your AI Infrastructure Today

Ready to protect your GPU resources and stop LLM-jacking? Learn how to implement enterprise-grade security for your AI inference endpoints:

Watch the Webinar → Request a Demo →

Scale AI Safely with Calico

Calico Ingress Gateway with integrated WAF provides a Kubernetes-native approach to securing AI inference endpoints. By enforcing identity-aware access, deep Layer 7 inspection, and fine-grained rate limiting at the cluster edge, Calico helps teams run AI workloads safely and predictably.

You cannot scale AI if you cannot secure the inference path. Protecting ingress is not just a security requirement. It is a foundation for sustainable AI operations in Kubernetes.

The post Ingress Security for AI Workloads in Kubernetes: Protecting AI Endpoints with WAF appeared first on Tigera - Creator of Calico.

DEV Community