Hemant Burman

Posted on Mar 16

Kubernetes Rate Limiting

#kubernetes #ratelimiting #virtualservice

What is RateLimiting?

Rate limiting is a technique that controls how frequently an action (like an API request) can be performed within a specific time frame to prevent overuse or abuse of a system.

Diff terms Diff meaning RateLimiting vs Throttling

In practice, the terms “throttling” and “rate limiting” are often used interchangeably, but they focus on slightly different aspects of controlling traffic:

Rate limiting typically sets a hard cap on how many requests or actions can occur in a given time window. If the limit is exceeded, additional requests are rejected or delayed until the next interval.
Throttling is generally more dynamic: it slows down or adjusts the flow of requests (rather than outright blocking them) so that the system can stay within safe operating limits and maintain performance. Throttling often happens in real time, deciding how fast or slow requests should be allowed through.

Layers of Kubernetes for RateLimiting

Virtual Service A VirtualService is an Istio custom resource (CRD) used in Kubernetes environments to define how network traffic is routed to services within a service mesh. It allows you to configure rules for routing, traffic splitting, retries, timeouts, and more, providing fine-grained control over how requests reach and flow between the underlying services.
Sidecar A sidecar in Kubernetes refers to a supplementary container running alongside the main application container within the same Pod. It shares resources like storage and network with the main container and typically provides auxiliary features—such as logging, monitoring, or proxying—that extend or enhance the primary application's functionality without altering the core application code.

What to choose and when for RateLimiting?

VirtualService
- Limits Requests at a Kubernetes Cluster Level
- Serve a ratelimiting response even before it reaches your app environment
- Centralized/Global Control, If you want to enforce traffic policies across multiple pods or services in a consistent way, configuring rate limits at the VirtualService level gives you a higher-level, more global enforcement
Sidecar
- Allow requests until your sidecar container
- Throttling of requests can be done here
- Granular/Per-Pod Control If you need more nuanced control (for instance, different rate limits per pod or specialized logic that depends on local metrics), implementing rate limiting at the sidecar (e.g., using Envoy filters) can give you that fine-grained approach.
- Local Enforcement Rate limiting at the sidecar level can ensure that local spikes in traffic are handled directly at each pod, rather than having to route all decisions through a centralized policy. This can reduce latency and give more immediate control over local resources.

How?

VirtualService

Return an HTTP error code using fault injection
- You can configure a VirtualService rule that responds with an error (for example, HTTP 503) for 100% of requests, without forwarding them to a real destination. This approach “blackholes” the traffic by immediately aborting the request

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: blackhole-example
spec:
  hosts:
  - my-service.default.svc.cluster.local
  http:
  - fault:
      abort:
        httpStatus: 503        # The HTTP status code to return
        percentage:
          value: 100           # 100% of traffic is aborted
    route:
      - destination:
          host: my-service     # Typically references a real service, but traffic won't reach it

Omit the route destination altogether
- Another way is to define a rule that simply does not specify a valid route. Envoy (underlying the Istio data plane) will not have any valid upstream cluster to forward requests to, resulting in a blackhole behavior

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: blackhole-example
spec:
  hosts:
  - my-service.default.svc.cluster.local
  http:
  - match:
      - uri:
          prefix: "/"
    route: []  # No routes => effectively blackholed

Blackhole X% of Traffic
- Below is a simple example showing how you can “blackhole” (abort) a percentage of traffic at the VirtualService level in Istio, while allowing the remaining requests to be routed normally. This uses fault injection (with abort) to return an HTTP error for a certain percentage of requests, and routes the rest to the service.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: partial-blackhole
spec:
  hosts:
  - my-service.default.svc.cluster.local
  http:
  - name: "partial-blackhole-rule"
    fault:
      abort:
        httpStatus: 503              # Return HTTP 503 for "blackholed" requests
        percentage:
          value: 10                 # Blackhole 10% of requests
    route:
    - destination:
        host: my-service            # 90% of traffic goes here

Path based blackholing X% of traffic
- Below is an example VirtualService that “blackholes” a certain percentage of requests only for a specific path (e.g., "/blackhole") and routes the rest of the traffic normally. In this example, half of the requests (50%) to "/blackhole" get an immediate 503 response, while all other requests go through without fault injection.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: partial-blackhole-by-path
spec:
  hosts:
  - my-service.default.svc.cluster.local
  http:
  - name: blackhole-specific-path
    match:
      - uri:
          prefix: "/blackhole"
    fault:
      abort:
        httpStatus: 503
        percentage:
          value: 50     # 50% of requests to /blackhole are "blackholed"
    route:
      - destination:
          host: my-service
  - name: other-traffic
    route:
      - destination:
          host: my-service

Will Discuss Sidecar based throttling and Ratelimiting in a separate post

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

DEV Community

Kubernetes Rate Limiting

What is RateLimiting?

Diff terms Diff meaning RateLimiting vs Throttling

Layers of Kubernetes for RateLimiting

What to choose and when for RateLimiting?

How?

VirtualService

Get n8n VPS hosting 3x cheaper than a cloud solution

Top comments (0)

Create up to 10 Postgres Databases on Neon's free plan.

Read next

The Ripple Effect: How a Single Push Notification Brought Down Our Kubernetes Cluster

Combining Kubernetes And wasmCloud

Google's Intent-Based Deployment with Prodspec and Annealing

Automatiza el despliegue de EKS y Argo CD con Terraform

Okay