Modern APIs need rate limiting, authentication, caching, and observability — but running a dedicated gateway server adds cost and complexity. Cloudflare Workers lets you build a full-featured API gateway at the edge, with zero cold starts and global distribution.
In this guide, we'll build a production-ready API gateway using Hono (a lightweight web framework), Durable Objects (for distributed rate limiting), and Workers' built-in Cache API.
Prerequisites
- Node.js 18+ and npm
- A Cloudflare account (free tier works)
- Basic familiarity with TypeScript and REST APIs
Project Setup
npm create cloudflare@latest api-gateway -- --template hono
cd api-gateway
npm install hono jose
Your wrangler.toml needs Durable Object bindings:
name = "api-gateway"
main = "src/index.ts"
compatibility_date = "2024-01-01"
[durable_objects]
bindings = [
{ name = "RATE_LIMITER", class_name = "RateLimiter" }
]
[[migrations]]
tag = "v1"
new_classes = ["RateLimiter"]
[vars]
UPSTREAM_URL = "https://api.example.com"
RATE_LIMIT_RPM = "60"
Gateway Architecture
Client -> [Auth] -> [Rate Limit] -> [Cache Check] -> [Proxy] -> [Log] -> Response
Each step is a Hono middleware. If any step fails, the request short-circuits with an error.
Step 1: The Hono Application Shell
// src/index.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { RateLimiter } from './rate-limiter';
import { authMiddleware } from './middleware/auth';
import { rateLimitMiddleware } from './middleware/rate-limit';
import { cacheMiddleware } from './middleware/cache';
import { loggingMiddleware } from './middleware/logging';
import { proxyHandler } from './handlers/proxy';
type Bindings = {
RATE_LIMITER: DurableObjectNamespace;
UPSTREAM_URL: string;
RATE_LIMIT_RPM: string;
JWT_SECRET: string;
};
const app = new Hono<{ Bindings: Bindings }>();
app.use('*', cors());
app.use('*', loggingMiddleware);
app.get('/health', (c) => c.json({ status: 'ok', edge: c.req.header('cf-ray') }));
app.use('/api/*', authMiddleware);
app.use('/api/*', rateLimitMiddleware);
app.get('/api/*', cacheMiddleware);
app.all('/api/*', proxyHandler);
export default app;
export { RateLimiter };
Step 2: JWT Authentication
// src/middleware/auth.ts
import { createMiddleware } from 'hono/factory';
import * as jose from 'jose';
export const authMiddleware = createMiddleware(async (c, next) => {
const authHeader = c.req.header('Authorization');
if (!authHeader?.startsWith('Bearer ')) {
return c.json({ error: 'Missing or invalid Authorization header' }, 401);
}
const token = authHeader.slice(7);
try {
const secret = new TextEncoder().encode(c.env.JWT_SECRET);
const { payload } = await jose.jwtVerify(token, secret, {
algorithms: ['HS256'],
});
c.set('userId', payload.sub as string);
c.set('scopes', (payload.scopes as string[]) || []);
await next();
} catch (err) {
if (err instanceof jose.errors.JWTExpired) {
return c.json({ error: 'Token expired' }, 401);
}
return c.json({ error: 'Invalid token' }, 401);
}
});
Why jose over jsonwebtoken? jose uses the Web Crypto API natively -- perfect for edge runtimes without Node.js polyfills.
Step 3: Distributed Rate Limiting with Durable Objects
// src/rate-limiter.ts
export class RateLimiter {
private state: DurableObjectState;
private requests: number[] = [];
constructor(state: DurableObjectState) {
this.state = state;
}
async fetch(request: Request): Promise<Response> {
const url = new URL(request.url);
const limit = parseInt(url.searchParams.get('limit') || '60');
const windowMs = parseInt(url.searchParams.get('window') || '60000');
const now = Date.now();
const stored = await this.state.storage.get<number[]>('requests');
if (stored) this.requests = stored;
this.requests = this.requests.filter((ts) => now - ts < windowMs);
if (this.requests.length >= limit) {
const oldestInWindow = Math.min(...this.requests);
const retryAfter = Math.ceil((oldestInWindow + windowMs - now) / 1000);
return new Response(
JSON.stringify({ error: 'Rate limit exceeded', retryAfter, limit, remaining: 0 }),
{
status: 429,
headers: {
'Content-Type': 'application/json',
'Retry-After': retryAfter.toString(),
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': '0',
},
}
);
}
this.requests.push(now);
await this.state.storage.put('requests', this.requests);
const remaining = limit - this.requests.length;
return new Response(
JSON.stringify({ allowed: true, remaining, limit }),
{
headers: {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': remaining.toString(),
},
}
);
}
}
The middleware:
// src/middleware/rate-limit.ts
import { createMiddleware } from 'hono/factory';
export const rateLimitMiddleware = createMiddleware(async (c, next) => {
const userId = c.get('userId') || c.req.header('cf-connecting-ip') || 'anonymous';
const id = c.env.RATE_LIMITER.idFromName(userId);
const limiter = c.env.RATE_LIMITER.get(id);
const limit = parseInt(c.env.RATE_LIMIT_RPM || '60');
const resp = await limiter.fetch(
new Request(`https://limiter/?limit=${limit}&window=60000`)
);
const result = await resp.json<{ allowed?: boolean; remaining: number; limit: number }>();
c.header('X-RateLimit-Limit', result.limit.toString());
c.header('X-RateLimit-Remaining', result.remaining.toString());
if (!result.allowed) {
return c.json({ error: 'Rate limit exceeded' }, 429);
}
await next();
});
Each user gets their own Durable Object instance -- rate limits are per-user and globally consistent across all edge locations.
Step 4: Response Caching
// src/middleware/cache.ts
import { createMiddleware } from 'hono/factory';
export const cacheMiddleware = createMiddleware(async (c, next) => {
if (c.req.method !== 'GET') { await next(); return; }
const cache = caches.default;
const cacheKey = new Request(c.req.url, { method: 'GET' });
const cached = await cache.match(cacheKey);
if (cached) {
c.header('X-Cache', 'HIT');
const body = await cached.text();
const headers = Object.fromEntries(cached.headers.entries());
return c.body(body, 200, headers);
}
c.header('X-Cache', 'MISS');
await next();
if (c.res.status === 200) {
const response = c.res.clone();
const cacheResponse = new Response(response.body, {
headers: {
...Object.fromEntries(response.headers.entries()),
'Cache-Control': 'public, max-age=60',
},
});
c.executionCtx.waitUntil(cache.put(cacheKey, cacheResponse));
}
});
Step 5: Request Proxying
// src/handlers/proxy.ts
import { createMiddleware } from 'hono/factory';
export const proxyHandler = createMiddleware(async (c) => {
const upstreamUrl = new URL(c.req.path.replace('/api', ''), c.env.UPSTREAM_URL);
const requestUrl = new URL(c.req.url);
requestUrl.searchParams.forEach((value, key) => {
upstreamUrl.searchParams.set(key, value);
});
const headers = new Headers(c.req.raw.headers);
headers.delete('Authorization');
headers.set('X-Forwarded-For', c.req.header('cf-connecting-ip') || '');
headers.set('X-Request-ID', crypto.randomUUID());
const upstreamReq = new Request(upstreamUrl.toString(), {
method: c.req.method,
headers,
body: ['GET', 'HEAD'].includes(c.req.method) ? null : c.req.raw.body,
});
const startTime = Date.now();
const response = await fetch(upstreamReq);
const duration = Date.now() - startTime;
const responseHeaders = new Headers(response.headers);
responseHeaders.set('X-Gateway-Duration', `${duration}ms`);
responseHeaders.set('X-Request-ID', headers.get('X-Request-ID')!);
return new Response(response.body, {
status: response.status,
headers: responseHeaders,
});
});
Step 6: Structured Logging
// src/middleware/logging.ts
import { createMiddleware } from 'hono/factory';
export const loggingMiddleware = createMiddleware(async (c, next) => {
const requestId = crypto.randomUUID();
const startTime = Date.now();
c.header('X-Request-ID', requestId);
await next();
const logEntry = {
timestamp: new Date().toISOString(),
requestId,
method: c.req.method,
path: c.req.path,
status: c.res.status,
duration: Date.now() - startTime,
ip: c.req.header('cf-connecting-ip'),
userAgent: c.req.header('user-agent'),
country: c.req.header('cf-ipcountry'),
userId: c.get('userId') || null,
cacheStatus: c.res.headers.get('X-Cache') || 'N/A',
};
console.log(JSON.stringify(logEntry));
});
Performance
| Component | Overhead |
|---|---|
| JWT verification | ~1-2ms |
| Rate limit (Durable Object) | ~5-15ms |
| Cache hit | ~1ms |
| Cache miss + proxy | Upstream latency + ~2ms |
| Logging (async) | 0ms |
Total overhead for cached responses: under 5ms. For uncached with rate limiting: 10-20ms.
Production Hardening
app.onError((err, c) => {
console.error(JSON.stringify({
error: err.message,
stack: err.stack,
path: c.req.path,
}));
return c.json({ error: 'Internal gateway error' }, 500);
});
// Request size limit
app.use('/api/*', async (c, next) => {
const contentLength = parseInt(c.req.header('content-length') || '0');
if (contentLength > 10 * 1024 * 1024) {
return c.json({ error: 'Request too large' }, 413);
}
await next();
});
Conclusion
Under 300 lines of TypeScript gives you authentication, distributed rate limiting, caching, and structured logging at the edge. Key advantages:
- Zero cold starts and global distribution across 300+ cities
- Pay per request ($0.50/million on paid plan)
- Strongly consistent rate limiting via Durable Objects
Next steps: API key management (KV), request transformation, A/B routing, WebSocket proxying.
If this was helpful, you can support my work at ko-fi.com/nopkt
Top comments (0)