Building a Rate Limiting As A Service: Across Modern Systems Rate Limiter

suresh peddinti — Mon, 16 Mar 2026 01:16:43 +0000

RLAAS (Rate Limiting As A Service) is an open-source, policy-driven platform for controlling HTTP traffic, logs, spans, metrics, and events with one reusable engine.

The Problem Nobody Talks About

Every engineering team knows they need rate limiting. But most solutions only protect one layer — the API gateway.

What happens to everything else?

Here are the real pain points I kept running into:

Log floods — A bug sends millions of error logs to your observability stack. Costs spike. Dashboards break. On-call engineers drown in noise.
Metric storms — A chatty service emits 50x normal Datadog metrics during a deployment. Your bill triples overnight.
Kafka cascades — A slow consumer falls behind. Retries pile up. One service takes down the entire event pipeline.
Sidecar blindspots — Traffic between services inside a mesh never hits your gateway. There’s nothing enforcing limits there.
Copy-paste rate limiting — Every team reimplements throttling logic in their own service, with their own bugs, their own edge cases, and no shared policy.
The root cause is the same in every case: rate limiting is treated as a gateway feature, not a platform capability.

That’s what I set out to fix.

Introducing RLAAS

RLAAS (Rate Limiting As A Service) is an open-source, policy-driven platform written in Go. It applies consistent rate limiting decisions across multiple domains — HTTP, gRPC, telemetry, events, and sidecars — using one unified engine.

Instead of scattered, per-service throttling code, you define policies once and enforce them everywhere.

The core idea is simple:

One policy engine. Multiple providers. Multiple deployment models.

Algorithms It Supports Today

RLAAS doesn’t lock you into a single algorithm. Each policy independently chooses the algorithm that fits its traffic pattern:

Token Bucket — A bucket refills tokens at a fixed rate. Requests consume tokens. When empty, requests are throttled. Great for bursty traffic you want to smooth out without hard-blocking. Example: allow up to 100 API calls per minute with short bursts permitted.

Sliding Window — Tracks requests across a continuously rolling time window. Eliminates the “boundary spike” problem where clients fire double the limit by straddling two fixed-window edges. Best for accurate per-user and per-tenant quota enforcement.

Fixed Window — Counts requests in a hard time slot (e.g., 0–60 s). Simple, cheap, and predictable. Best when coarse-grained limits matter more than precision.

Leaky Bucket — Enforces a strict, steady output rate regardless of how bursty the input is. Useful for protecting downstream services that can’t handle spikes even if the total volume is within limits.

Concurrent Request Limiter — Caps the number of in-flight requests at any moment. Essential for protecting slow upstream dependencies from being overwhelmed by parallel callers.

A single RLAAS deployment can run all algorithms simultaneously across different policies and resources.

What RLAAS Integrates With

One policy engine. Many integration points:

Decisions — More Than Just Allow or Deny

Most rate limiters return two answers: pass or reject. RLAAS returns five:

Shadow mode is especially powerful during rollouts. You can observe exactly what would have been throttled before flipping enforcement on — no surprises, no incidents.

Each policy declares its own action, so one policy can DENY abusive callers while another DROPs noisy telemetry and a third runs in SHADOW mode while the team validates the thresholds.

Three Ways to Deploy It

Embedded SDK — Import the library directly into your service. Zero network hop. Full control. Works in Go, Python, Java, and TypeScript.

Centralized Service — Deploy rlaas-server as a shared microservice. All your services call it over gRPC or HTTP to get allow/deny decisions. One place to manage all policies.

Sidecar / Agent — Run rlaas-agent as a sidecar next to your workload. No code changes needed. Intercepts traffic at the infrastructure level. Works with Kubernetes, service meshes, and bare-metal alike.

How Policies Work

Policies are declarative and version-controlled. You define: who the policy applies to, which algorithm to use, the limit, the window, and what to do when the limit is hit.

{
  "id": "nw-payments-logs",
  "org_id": "northwind",
  "resource": "payments.logs",
  "algorithm": "sliding_window",
  "limit": 5000,
  "window_seconds": 60,
  "action": "drop"
}

No code changes. No redeploys. Policy updates take effect immediately.

Why Open Source?

Rate limiting logic is not your competitive advantage. It’s infrastructure — the same way a load balancer or a message queue is infrastructure. It should be shared, composable, and policy-driven rather than handcrafted inside each microservice.

RLAAS is MIT-licensed. Every algorithm, every adapter, every SDK is open source and built to be extended.

Try It

Docs: https://rlaas-io.github.io/rlaas/
GitHub: https://github.com/rlaas-io/rlaas
If this solves a problem you’re dealing with, open an issue, contribute an adapter, or share it with your team.

DEV Community: suresh peddinti