DEV Community

Haji Rufai
Haji Rufai

Posted on • Originally published at hajirufai.github.io

I Built an API Gateway from Scratch in Python — Zero Dependencies

Every backend developer eventually encounters API gateways — NGINX, Kong, Envoy. They're powerful, but how do they actually work? I decided to find out by building one from scratch.

GateLite is a fully functional API gateway written in pure Python with zero external dependencies. It handles routing, load balancing, rate limiting, authentication, circuit breaking, caching, and more — all in ~7,000 lines using only the standard library.

GitHub Repository | Live Demo

The Request Pipeline

Every request flows through a carefully ordered pipeline:

Client → CORS → Middleware → Auth → Rate Limit → Route Match
  → Cache → Transform → Circuit Breaker → Load Balance
  → Proxy Forward → Transform → Cache Store → Client
Enter fullscreen mode Exit fullscreen mode

Each stage can short-circuit the pipeline with an early response. A rate-limited request never reaches the upstream. A cached response skips the proxy entirely.

Core Architecture

HTTP Parser

Since we can't use flask or fastapi, we need our own HTTP parser. GateLite reads raw bytes from TCP sockets and parses HTTP/1.1 requests:

class HTTPParser:
    def parse_request(self, data: bytes) -> Request:
        lines = data.split(b"\r\n")
        method, path, version = lines[0].decode().split(" ", 2)

        headers = Headers()
        body_start = 0
        for i, line in enumerate(lines[1:], 1):
            if line == b"":
                body_start = i + 1
                break
            name, value = line.decode().split(": ", 1)
            headers.add(name, value)

        body = b"\r\n".join(lines[body_start:]) if body_start else None
        return Request(method=method, path=path, headers=headers, body=body)
Enter fullscreen mode Exit fullscreen mode

Smart Routing

Routes support four match types with priority ordering:

# Exact match (highest priority)
Route(name="health", path="/health", upstream="internal")

# Parameterized paths
Route(name="user", path="/users/{id}", upstream="user-svc")

# Prefix matching
Route(name="api", path="/api/*", upstream="backend")

# Regex (lowest priority)
Route(name="versioned", path="/v[0-9]+/.*", upstream="backend")
Enter fullscreen mode Exit fullscreen mode

The router compiles patterns once and sorts by priority. Parameterized routes extract path params into a dict:

match = router.match("GET", "/users/42")
# match = (Route("user"), {"id": "42"})
Enter fullscreen mode Exit fullscreen mode

Load Balancing

GateLite implements five load balancing algorithms:

class RoundRobin:
    def select(self, upstreams):
        self._index = (self._index + 1) % len(upstreams)
        return upstreams[self._index]

class LeastConnections:
    def select(self, upstreams):
        return min(upstreams, key=lambda u: u.active_connections)

class IPHash:
    def select(self, upstreams, client_ip):
        idx = hash(client_ip) % len(upstreams)
        return upstreams[idx]
Enter fullscreen mode Exit fullscreen mode

Weighted round-robin distributes traffic proportionally — a server with weight 3 gets 3x the traffic of weight 1.

Token Bucket Rate Limiting

The token bucket algorithm provides smooth rate limiting with burst handling:

class TokenBucket:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate          # tokens per second
        self.capacity = capacity  # max burst size
        self.tokens = capacity
        self.last_refill = time.monotonic()

    def allow(self) -> bool:
        self._refill()
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

    def _refill(self):
        now = time.monotonic()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        self.last_refill = now
Enter fullscreen mode Exit fullscreen mode

GateLite also implements fixed window and sliding window log algorithms.

Circuit Breaker

Circuit breakers prevent cascading failures. When an upstream fails repeatedly, the circuit "opens" and rejects requests immediately instead of waiting for timeouts:

class CircuitBreaker:
    # States: CLOSED (normal) → OPEN (rejecting) → HALF_OPEN (testing)

    def allow_request(self) -> bool:
        if self.state == State.CLOSED:
            return True
        if self.state == State.OPEN:
            if time.monotonic() - self.last_failure > self.recovery_timeout:
                self.state = State.HALF_OPEN
                return True  # Allow one test request
            return False
        # HALF_OPEN: allow limited requests
        return self._half_open_requests < self.max_requests

    def record_success(self):
        if self.state == State.HALF_OPEN:
            self._successes += 1
            if self._successes >= self.success_threshold:
                self.state = State.CLOSED  # Recovery!

    def record_failure(self):
        self._failures += 1
        if self._failures >= self.failure_threshold:
            self.state = State.OPEN
Enter fullscreen mode Exit fullscreen mode

JWT Authentication (No Libraries!)

Implementing JWT from scratch means working with Base64 and HMAC directly:

class JWTAuth:
    def decode(self, token: str) -> tuple[dict, dict]:
        parts = token.split(".")
        header = json.loads(self._b64decode(parts[0]))
        payload = json.loads(self._b64decode(parts[1]))

        # Verify signature
        signing_input = f"{parts[0]}.{parts[1]}".encode()
        expected = hmac.new(
            self.secret, signing_input, hashlib.sha256
        ).digest()
        actual = self._b64decode(parts[2])

        if not hmac.compare_digest(expected, actual):
            raise ValueError("Invalid signature")

        return header, payload
Enter fullscreen mode Exit fullscreen mode

What I Learned

1. HTTP parsing is deceptively tricky. Edge cases everywhere: missing Content-Length, malformed headers, partial reads. The parser needs to be robust without being slow.

2. Thread safety matters. Rate limiters, circuit breakers, and metrics all need thread-safe access. Python's threading.Lock is your friend, but you need to minimize critical sections.

3. The pipeline pattern is powerful. Each middleware stage is independent and composable. Adding a new feature means adding a new stage, not modifying existing ones.

4. Circuit breakers need careful thresholds. Too sensitive and you get false positives. Too lenient and you don't protect against real failures. The half-open state is crucial for automatic recovery.

5. Caching is a protocol. Cache-Control headers, ETag validation, conditional 304 responses — proper HTTP caching follows a detailed spec that's worth understanding.

The Numbers

Metric Value
Source modules 24
Test modules 19
Total lines ~7,000
Passing tests 298
External deps 0

Try It

git clone https://github.com/hajirufai/gatelite.git
cd gatelite
python -m pytest tests/ -v  # Run all 298 tests
python examples/basic_proxy.py  # Start a proxy
Enter fullscreen mode Exit fullscreen mode

The full source is on GitHub. Contributions and feedback welcome!


This is project #16 in my "Building from Scratch" series. Previous projects include a message broker, search engine, compiler, and distributed key-value store.

Top comments (0)