Sir Max

Posted on Jul 1

How to Build an API Gateway in Python — Rate Limiting, Auth, and Routing in Under 200 Lines

#python #webdev #api #tutorial

Building an API Gateway in Python: Rate Limiting, Auth, and Routing in Under 200 Lines

I've spent the last few years building and consuming APIs. One thing that keeps coming up: every team eventually needs a gateway. But when you're a small team — or a solo developer — pulling in Kong or NGINX Plus feels like bringing a flamethrower to a birthday candle.

So I built a minimal one in Python. Not for production at Netflix scale. For the 90% of cases where you just need to:

Route requests to different backends
Add a simple API key check
Stop one bad client from hammering your service

Here's how it works, and what I learned along the way.

The Problem

Imagine you have three microservices running on different ports:

User Service    → localhost:5001
Order Service   → localhost:5002
Product Service → localhost:5003

You want a single entry point. Clients hit localhost:8000/api/users, and the gateway forwards to localhost:5001/users. Clean URLs, one port to expose, and you can swap backends without clients knowing.

Add to that: you want API keys, and you don't want one client sending 500 requests per second.

The Gateway — Step by Step

1. Route Table

First, define where requests go. Keep it dead simple — a dictionary mapping URL prefixes to backend URLs:

ROUTES = {
    "/api/users":    "http://localhost:5001",
    "/api/orders":   "http://localhost:5002",
    "/api/products": "http://localhost:5003",
}

The gateway strips the /api/users prefix and forwards the rest. So GET /api/users/42 becomes GET http://localhost:5001/42.

2. Forwarding Requests with Flask

from flask import Flask, request, Response
import requests

app = Flask(__name__)

def forward_request(backend_url):
    """Forward incoming request to backend and return response."""
    # Build target URL
    path = request.path
    for prefix, backend in ROUTES.items():
        if path.startswith(prefix):
            target = backend + path[len(prefix):]
            break
    else:
        return Response("Not Found", status=404)

    # Forward with same method, headers, and body
    resp = requests.request(
        method=request.method,
        url=target,
        headers={k: v for k, v in request.headers.items()
                 if k.lower() not in ("host", "content-length")},
        data=request.get_data(),
        params=request.args,
        timeout=10,
    )
    return Response(resp.content, status=resp.status_code,
                    headers=dict(resp.headers))

⚠️ Lesson learned: Strip Host and Content-Length headers before forwarding. The Host header will confuse your backend if it does virtual hosting. And requests sets Content-Length automatically — sending it twice causes weird bugs.

3. API Key Authentication

For internal tools and early-stage products, a shared secret per client is often enough. Not OAuth. Not JWTs. Just a header.

API_KEYS = {
    "sk-dev-app-1234": {"name": "Mobile App", "rate_limit": 100},
    "sk-dev-web-5678": {"name": "Web Dashboard", "rate_limit": 200},
}

def check_api_key():
    """Validate API key from X-API-Key header."""
    key = request.headers.get("X-API-Key", "")
    client = API_KEYS.get(key)
    if not client:
        return Response(
            '{"error": "Invalid or missing API key"}',
            status=401,
            content_type="application/json",
        )
    return client

Call check_api_key() before forwarding. If it returns a Response, short-circuit and return it.

🧠 Why not JWTs? For internal service-to-service calls, JWTs add complexity without real benefit. You're not federating identity across untrusted domains — you're routing between your own services. A static key is fast, debuggable, and trivially revocable. Start here, migrate to JWTs when you actually need delegation.

4. Rate Limiting with a Token Bucket

The simplest rate limiter that actually works: the token bucket algorithm.

Each client has a bucket with max_tokens capacity
Tokens refill at a steady rate (e.g., 10 tokens/second)
Each request consumes 1 token
If the bucket is empty, reject with 429

import time
from collections import defaultdict

class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate          # tokens per second
        self.capacity = capacity  # max tokens
        self.tokens = capacity
        self.last_refill = time.time()

    def consume(self, n=1):
        """Try to consume n tokens. Returns True if successful."""
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity,
                         self.tokens + elapsed * self.rate)
        self.last_refill = now

        if self.tokens >= n:
            self.tokens -= n
            return True
        return False

# Per-client buckets (in production, use Redis)
buckets = defaultdict(lambda: TokenBucket(rate=10, capacity=100))

🔥 Real talk: In-memory rate limiting works fine... until you have multiple gateway instances. Then two instances each allow 100 req/s, and your backend gets 200. Use Redis (INCR + EXPIRE) when you go multi-instance. I learned this the hard way at 3 AM during a load test.

5. Putting It All Together

@app.route("/<path:subpath>", methods=["GET", "POST", "PUT", "DELETE", "PATCH"])
def gateway(subpath):
    # Step 1: Authenticate
    client = check_api_key()
    if isinstance(client, Response):
        return client

    # Step 2: Rate limit
    bucket = buckets[request.headers["X-API-Key"]]
    if not bucket.consume():
        return Response(
            '{"error": "Rate limit exceeded. Try again later."}',
            status=429,
            content_type="application/json",
        )

    # Step 3: Forward
    return forward_request(None)

if __name__ == "__main__":
    app.run(port=8000)

That's it. ~180 lines of Python. A working API gateway with routing, authentication, and rate limiting.

Testing It

Start your backend services (even dummy Flask apps on 5001-5003), then fire up the gateway:

# Start the gateway
python gateway.py

# Test in another terminal
curl -H "X-API-Key: sk-dev-app-1234" http://localhost:8000/api/users/1
# → {"id": 1, "name": "Alice"}  ← forwarded from backend

curl http://localhost:8000/api/users/1
# → {"error": "Invalid or missing API key"}  ← 401

# Test rate limiting (run this 150 times quickly)
for i in $(seq 1 150); do
  curl -s -H "X-API-Key: sk-dev-app-1234" \
    http://localhost:8000/api/users/1 > /dev/null
done
# First ~100 succeed, then 429s start rolling in

What I'd Add for Production

This is a minimal gateway. Here's what I add when a project graduates from "prototype" to "people actually use this":

Feature	Why	Library
Redis-backed rate limits	Works across instances	`redis-py`
Structured logging	Debugging distributed calls	`structlog`
Circuit breaker	Stop forwarding to dead backends	`pybreaker`
Request ID propagation	Trace requests end-to-end	`X-Request-Id` header
Metrics (Prometheus)	Know what's happening	`prometheus_client`
Config from env vars	Docker-friendly	`os.environ`

Why Build Instead of Buy?

I've used Kong. I've used Traefik. They're excellent. But:

Learning: Building your own gateway teaches you what the big ones are doing under the hood. After this, Kong's config files make a lot more sense.
Control: When something breaks at 2 AM, you can read every line of code. No black boxes.
Simplicity: For < 5 services and < 1000 req/s, the big gateways are overkill. Python is fine.

The key insight: you don't need to solve every problem on day one. Start with routing + keys + rate limiting. Add the rest when you need it.

The Full Code

I put the complete gateway gist here for reference: it's the code blocks above stitched together with a few extras (health check endpoint, graceful shutdown, environment variable config). Copy-paste, run, iterate.

What's your API gateway setup? Are you rolling your own or using an off-the-shelf solution? I'd love to hear what's working (and what isn't) in the comments.

DEV Community