Every backend developer eventually encounters API gateways — NGINX, Kong, Envoy. They're powerful, but how do they actually work? I decided to find out by building one from scratch.
GateLite is a fully functional API gateway written in pure Python with zero external dependencies. It handles routing, load balancing, rate limiting, authentication, circuit breaking, caching, and more — all in ~7,000 lines using only the standard library.
The Request Pipeline
Every request flows through a carefully ordered pipeline:
Client → CORS → Middleware → Auth → Rate Limit → Route Match
→ Cache → Transform → Circuit Breaker → Load Balance
→ Proxy Forward → Transform → Cache Store → Client
Each stage can short-circuit the pipeline with an early response. A rate-limited request never reaches the upstream. A cached response skips the proxy entirely.
Core Architecture
HTTP Parser
Since we can't use flask or fastapi, we need our own HTTP parser. GateLite reads raw bytes from TCP sockets and parses HTTP/1.1 requests:
class HTTPParser:
def parse_request(self, data: bytes) -> Request:
lines = data.split(b"\r\n")
method, path, version = lines[0].decode().split(" ", 2)
headers = Headers()
body_start = 0
for i, line in enumerate(lines[1:], 1):
if line == b"":
body_start = i + 1
break
name, value = line.decode().split(": ", 1)
headers.add(name, value)
body = b"\r\n".join(lines[body_start:]) if body_start else None
return Request(method=method, path=path, headers=headers, body=body)
Smart Routing
Routes support four match types with priority ordering:
# Exact match (highest priority)
Route(name="health", path="/health", upstream="internal")
# Parameterized paths
Route(name="user", path="/users/{id}", upstream="user-svc")
# Prefix matching
Route(name="api", path="/api/*", upstream="backend")
# Regex (lowest priority)
Route(name="versioned", path="/v[0-9]+/.*", upstream="backend")
The router compiles patterns once and sorts by priority. Parameterized routes extract path params into a dict:
match = router.match("GET", "/users/42")
# match = (Route("user"), {"id": "42"})
Load Balancing
GateLite implements five load balancing algorithms:
class RoundRobin:
def select(self, upstreams):
self._index = (self._index + 1) % len(upstreams)
return upstreams[self._index]
class LeastConnections:
def select(self, upstreams):
return min(upstreams, key=lambda u: u.active_connections)
class IPHash:
def select(self, upstreams, client_ip):
idx = hash(client_ip) % len(upstreams)
return upstreams[idx]
Weighted round-robin distributes traffic proportionally — a server with weight 3 gets 3x the traffic of weight 1.
Token Bucket Rate Limiting
The token bucket algorithm provides smooth rate limiting with burst handling:
class TokenBucket:
def __init__(self, rate: float, capacity: int):
self.rate = rate # tokens per second
self.capacity = capacity # max burst size
self.tokens = capacity
self.last_refill = time.monotonic()
def allow(self) -> bool:
self._refill()
if self.tokens >= 1:
self.tokens -= 1
return True
return False
def _refill(self):
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
self.last_refill = now
GateLite also implements fixed window and sliding window log algorithms.
Circuit Breaker
Circuit breakers prevent cascading failures. When an upstream fails repeatedly, the circuit "opens" and rejects requests immediately instead of waiting for timeouts:
class CircuitBreaker:
# States: CLOSED (normal) → OPEN (rejecting) → HALF_OPEN (testing)
def allow_request(self) -> bool:
if self.state == State.CLOSED:
return True
if self.state == State.OPEN:
if time.monotonic() - self.last_failure > self.recovery_timeout:
self.state = State.HALF_OPEN
return True # Allow one test request
return False
# HALF_OPEN: allow limited requests
return self._half_open_requests < self.max_requests
def record_success(self):
if self.state == State.HALF_OPEN:
self._successes += 1
if self._successes >= self.success_threshold:
self.state = State.CLOSED # Recovery!
def record_failure(self):
self._failures += 1
if self._failures >= self.failure_threshold:
self.state = State.OPEN
JWT Authentication (No Libraries!)
Implementing JWT from scratch means working with Base64 and HMAC directly:
class JWTAuth:
def decode(self, token: str) -> tuple[dict, dict]:
parts = token.split(".")
header = json.loads(self._b64decode(parts[0]))
payload = json.loads(self._b64decode(parts[1]))
# Verify signature
signing_input = f"{parts[0]}.{parts[1]}".encode()
expected = hmac.new(
self.secret, signing_input, hashlib.sha256
).digest()
actual = self._b64decode(parts[2])
if not hmac.compare_digest(expected, actual):
raise ValueError("Invalid signature")
return header, payload
What I Learned
1. HTTP parsing is deceptively tricky. Edge cases everywhere: missing Content-Length, malformed headers, partial reads. The parser needs to be robust without being slow.
2. Thread safety matters. Rate limiters, circuit breakers, and metrics all need thread-safe access. Python's threading.Lock is your friend, but you need to minimize critical sections.
3. The pipeline pattern is powerful. Each middleware stage is independent and composable. Adding a new feature means adding a new stage, not modifying existing ones.
4. Circuit breakers need careful thresholds. Too sensitive and you get false positives. Too lenient and you don't protect against real failures. The half-open state is crucial for automatic recovery.
5. Caching is a protocol. Cache-Control headers, ETag validation, conditional 304 responses — proper HTTP caching follows a detailed spec that's worth understanding.
The Numbers
| Metric | Value |
|---|---|
| Source modules | 24 |
| Test modules | 19 |
| Total lines | ~7,000 |
| Passing tests | 298 |
| External deps | 0 |
Try It
git clone https://github.com/hajirufai/gatelite.git
cd gatelite
python -m pytest tests/ -v # Run all 298 tests
python examples/basic_proxy.py # Start a proxy
The full source is on GitHub. Contributions and feedback welcome!
This is project #16 in my "Building from Scratch" series. Previous projects include a message broker, search engine, compiler, and distributed key-value store.
Top comments (0)