Building an API Gateway in Python: Rate Limiting, Auth, and Routing in Under 200 Lines
I've spent the last few years building and consuming APIs. One thing that keeps coming up: every team eventually needs a gateway. But when you're a small team — or a solo developer — pulling in Kong or NGINX Plus feels like bringing a flamethrower to a birthday candle.
So I built a minimal one in Python. Not for production at Netflix scale. For the 90% of cases where you just need to:
- Route requests to different backends
- Add a simple API key check
- Stop one bad client from hammering your service
Here's how it works, and what I learned along the way.
The Problem
Imagine you have three microservices running on different ports:
User Service → localhost:5001
Order Service → localhost:5002
Product Service → localhost:5003
You want a single entry point. Clients hit localhost:8000/api/users, and the gateway forwards to localhost:5001/users. Clean URLs, one port to expose, and you can swap backends without clients knowing.
Add to that: you want API keys, and you don't want one client sending 500 requests per second.
The Gateway — Step by Step
1. Route Table
First, define where requests go. Keep it dead simple — a dictionary mapping URL prefixes to backend URLs:
ROUTES = {
"/api/users": "http://localhost:5001",
"/api/orders": "http://localhost:5002",
"/api/products": "http://localhost:5003",
}
The gateway strips the /api/users prefix and forwards the rest. So GET /api/users/42 becomes GET http://localhost:5001/42.
2. Forwarding Requests with Flask
from flask import Flask, request, Response
import requests
app = Flask(__name__)
def forward_request(backend_url):
"""Forward incoming request to backend and return response."""
# Build target URL
path = request.path
for prefix, backend in ROUTES.items():
if path.startswith(prefix):
target = backend + path[len(prefix):]
break
else:
return Response("Not Found", status=404)
# Forward with same method, headers, and body
resp = requests.request(
method=request.method,
url=target,
headers={k: v for k, v in request.headers.items()
if k.lower() not in ("host", "content-length")},
data=request.get_data(),
params=request.args,
timeout=10,
)
return Response(resp.content, status=resp.status_code,
headers=dict(resp.headers))
⚠️ Lesson learned: Strip
HostandContent-Lengthheaders before forwarding. TheHostheader will confuse your backend if it does virtual hosting. AndrequestssetsContent-Lengthautomatically — sending it twice causes weird bugs.
3. API Key Authentication
For internal tools and early-stage products, a shared secret per client is often enough. Not OAuth. Not JWTs. Just a header.
API_KEYS = {
"sk-dev-app-1234": {"name": "Mobile App", "rate_limit": 100},
"sk-dev-web-5678": {"name": "Web Dashboard", "rate_limit": 200},
}
def check_api_key():
"""Validate API key from X-API-Key header."""
key = request.headers.get("X-API-Key", "")
client = API_KEYS.get(key)
if not client:
return Response(
'{"error": "Invalid or missing API key"}',
status=401,
content_type="application/json",
)
return client
Call check_api_key() before forwarding. If it returns a Response, short-circuit and return it.
🧠 Why not JWTs? For internal service-to-service calls, JWTs add complexity without real benefit. You're not federating identity across untrusted domains — you're routing between your own services. A static key is fast, debuggable, and trivially revocable. Start here, migrate to JWTs when you actually need delegation.
4. Rate Limiting with a Token Bucket
The simplest rate limiter that actually works: the token bucket algorithm.
- Each client has a bucket with
max_tokenscapacity - Tokens refill at a steady rate (e.g., 10 tokens/second)
- Each request consumes 1 token
- If the bucket is empty, reject with 429
import time
from collections import defaultdict
class TokenBucket:
def __init__(self, rate, capacity):
self.rate = rate # tokens per second
self.capacity = capacity # max tokens
self.tokens = capacity
self.last_refill = time.time()
def consume(self, n=1):
"""Try to consume n tokens. Returns True if successful."""
now = time.time()
elapsed = now - self.last_refill
self.tokens = min(self.capacity,
self.tokens + elapsed * self.rate)
self.last_refill = now
if self.tokens >= n:
self.tokens -= n
return True
return False
# Per-client buckets (in production, use Redis)
buckets = defaultdict(lambda: TokenBucket(rate=10, capacity=100))
🔥 Real talk: In-memory rate limiting works fine... until you have multiple gateway instances. Then two instances each allow 100 req/s, and your backend gets 200. Use Redis (
INCR+EXPIRE) when you go multi-instance. I learned this the hard way at 3 AM during a load test.
5. Putting It All Together
@app.route("/<path:subpath>", methods=["GET", "POST", "PUT", "DELETE", "PATCH"])
def gateway(subpath):
# Step 1: Authenticate
client = check_api_key()
if isinstance(client, Response):
return client
# Step 2: Rate limit
bucket = buckets[request.headers["X-API-Key"]]
if not bucket.consume():
return Response(
'{"error": "Rate limit exceeded. Try again later."}',
status=429,
content_type="application/json",
)
# Step 3: Forward
return forward_request(None)
if __name__ == "__main__":
app.run(port=8000)
That's it. ~180 lines of Python. A working API gateway with routing, authentication, and rate limiting.
Testing It
Start your backend services (even dummy Flask apps on 5001-5003), then fire up the gateway:
# Start the gateway
python gateway.py
# Test in another terminal
curl -H "X-API-Key: sk-dev-app-1234" http://localhost:8000/api/users/1
# → {"id": 1, "name": "Alice"} ← forwarded from backend
curl http://localhost:8000/api/users/1
# → {"error": "Invalid or missing API key"} ← 401
# Test rate limiting (run this 150 times quickly)
for i in $(seq 1 150); do
curl -s -H "X-API-Key: sk-dev-app-1234" \
http://localhost:8000/api/users/1 > /dev/null
done
# First ~100 succeed, then 429s start rolling in
What I'd Add for Production
This is a minimal gateway. Here's what I add when a project graduates from "prototype" to "people actually use this":
| Feature | Why | Library |
|---|---|---|
| Redis-backed rate limits | Works across instances | redis-py |
| Structured logging | Debugging distributed calls | structlog |
| Circuit breaker | Stop forwarding to dead backends | pybreaker |
| Request ID propagation | Trace requests end-to-end |
X-Request-Id header |
| Metrics (Prometheus) | Know what's happening | prometheus_client |
| Config from env vars | Docker-friendly | os.environ |
Why Build Instead of Buy?
I've used Kong. I've used Traefik. They're excellent. But:
- Learning: Building your own gateway teaches you what the big ones are doing under the hood. After this, Kong's config files make a lot more sense.
- Control: When something breaks at 2 AM, you can read every line of code. No black boxes.
- Simplicity: For < 5 services and < 1000 req/s, the big gateways are overkill. Python is fine.
The key insight: you don't need to solve every problem on day one. Start with routing + keys + rate limiting. Add the rest when you need it.
The Full Code
I put the complete gateway gist here for reference: it's the code blocks above stitched together with a few extras (health check endpoint, graceful shutdown, environment variable config). Copy-paste, run, iterate.
What's your API gateway setup? Are you rolling your own or using an off-the-shelf solution? I'd love to hear what's working (and what isn't) in the comments.
Top comments (0)