DEV Community

Young Gao
Young Gao

Posted on

Building a Production API Gateway on Cloudflare Workers with Hono

Modern APIs need rate limiting, authentication, caching, and observability — but running a dedicated gateway server adds cost and complexity. Cloudflare Workers lets you build a full-featured API gateway at the edge, with zero cold starts and global distribution.

In this guide, we'll build a production-ready API gateway using Hono (a lightweight web framework), Durable Objects (for distributed rate limiting), and Workers' built-in Cache API.

Prerequisites

  • Node.js 18+ and npm
  • A Cloudflare account (free tier works)
  • Basic familiarity with TypeScript and REST APIs

Project Setup

npm create cloudflare@latest api-gateway -- --template hono
cd api-gateway
npm install hono jose
Enter fullscreen mode Exit fullscreen mode

Your wrangler.toml needs Durable Object bindings:

name = "api-gateway"
main = "src/index.ts"
compatibility_date = "2024-01-01"

[durable_objects]
bindings = [
  { name = "RATE_LIMITER", class_name = "RateLimiter" }
]

[[migrations]]
tag = "v1"
new_classes = ["RateLimiter"]

[vars]
UPSTREAM_URL = "https://api.example.com"
RATE_LIMIT_RPM = "60"
Enter fullscreen mode Exit fullscreen mode

Gateway Architecture

Client -> [Auth] -> [Rate Limit] -> [Cache Check] -> [Proxy] -> [Log] -> Response
Enter fullscreen mode Exit fullscreen mode

Each step is a Hono middleware. If any step fails, the request short-circuits with an error.

Step 1: The Hono Application Shell

// src/index.ts
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { RateLimiter } from './rate-limiter';
import { authMiddleware } from './middleware/auth';
import { rateLimitMiddleware } from './middleware/rate-limit';
import { cacheMiddleware } from './middleware/cache';
import { loggingMiddleware } from './middleware/logging';
import { proxyHandler } from './handlers/proxy';

type Bindings = {
  RATE_LIMITER: DurableObjectNamespace;
  UPSTREAM_URL: string;
  RATE_LIMIT_RPM: string;
  JWT_SECRET: string;
};

const app = new Hono<{ Bindings: Bindings }>();

app.use('*', cors());
app.use('*', loggingMiddleware);

app.get('/health', (c) => c.json({ status: 'ok', edge: c.req.header('cf-ray') }));

app.use('/api/*', authMiddleware);
app.use('/api/*', rateLimitMiddleware);
app.get('/api/*', cacheMiddleware);
app.all('/api/*', proxyHandler);

export default app;
export { RateLimiter };
Enter fullscreen mode Exit fullscreen mode

Step 2: JWT Authentication

// src/middleware/auth.ts
import { createMiddleware } from 'hono/factory';
import * as jose from 'jose';

export const authMiddleware = createMiddleware(async (c, next) => {
  const authHeader = c.req.header('Authorization');
  if (!authHeader?.startsWith('Bearer ')) {
    return c.json({ error: 'Missing or invalid Authorization header' }, 401);
  }

  const token = authHeader.slice(7);
  try {
    const secret = new TextEncoder().encode(c.env.JWT_SECRET);
    const { payload } = await jose.jwtVerify(token, secret, {
      algorithms: ['HS256'],
    });
    c.set('userId', payload.sub as string);
    c.set('scopes', (payload.scopes as string[]) || []);
    await next();
  } catch (err) {
    if (err instanceof jose.errors.JWTExpired) {
      return c.json({ error: 'Token expired' }, 401);
    }
    return c.json({ error: 'Invalid token' }, 401);
  }
});
Enter fullscreen mode Exit fullscreen mode

Why jose over jsonwebtoken? jose uses the Web Crypto API natively -- perfect for edge runtimes without Node.js polyfills.

Step 3: Distributed Rate Limiting with Durable Objects

// src/rate-limiter.ts
export class RateLimiter {
  private state: DurableObjectState;
  private requests: number[] = [];

  constructor(state: DurableObjectState) {
    this.state = state;
  }

  async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url);
    const limit = parseInt(url.searchParams.get('limit') || '60');
    const windowMs = parseInt(url.searchParams.get('window') || '60000');
    const now = Date.now();

    const stored = await this.state.storage.get<number[]>('requests');
    if (stored) this.requests = stored;

    this.requests = this.requests.filter((ts) => now - ts < windowMs);

    if (this.requests.length >= limit) {
      const oldestInWindow = Math.min(...this.requests);
      const retryAfter = Math.ceil((oldestInWindow + windowMs - now) / 1000);
      return new Response(
        JSON.stringify({ error: 'Rate limit exceeded', retryAfter, limit, remaining: 0 }),
        {
          status: 429,
          headers: {
            'Content-Type': 'application/json',
            'Retry-After': retryAfter.toString(),
            'X-RateLimit-Limit': limit.toString(),
            'X-RateLimit-Remaining': '0',
          },
        }
      );
    }

    this.requests.push(now);
    await this.state.storage.put('requests', this.requests);

    const remaining = limit - this.requests.length;
    return new Response(
      JSON.stringify({ allowed: true, remaining, limit }),
      {
        headers: {
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': remaining.toString(),
        },
      }
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

The middleware:

// src/middleware/rate-limit.ts
import { createMiddleware } from 'hono/factory';

export const rateLimitMiddleware = createMiddleware(async (c, next) => {
  const userId = c.get('userId') || c.req.header('cf-connecting-ip') || 'anonymous';
  const id = c.env.RATE_LIMITER.idFromName(userId);
  const limiter = c.env.RATE_LIMITER.get(id);

  const limit = parseInt(c.env.RATE_LIMIT_RPM || '60');
  const resp = await limiter.fetch(
    new Request(`https://limiter/?limit=${limit}&window=60000`)
  );

  const result = await resp.json<{ allowed?: boolean; remaining: number; limit: number }>();

  c.header('X-RateLimit-Limit', result.limit.toString());
  c.header('X-RateLimit-Remaining', result.remaining.toString());

  if (!result.allowed) {
    return c.json({ error: 'Rate limit exceeded' }, 429);
  }
  await next();
});
Enter fullscreen mode Exit fullscreen mode

Each user gets their own Durable Object instance -- rate limits are per-user and globally consistent across all edge locations.

Step 4: Response Caching

// src/middleware/cache.ts
import { createMiddleware } from 'hono/factory';

export const cacheMiddleware = createMiddleware(async (c, next) => {
  if (c.req.method !== 'GET') { await next(); return; }

  const cache = caches.default;
  const cacheKey = new Request(c.req.url, { method: 'GET' });

  const cached = await cache.match(cacheKey);
  if (cached) {
    c.header('X-Cache', 'HIT');
    const body = await cached.text();
    const headers = Object.fromEntries(cached.headers.entries());
    return c.body(body, 200, headers);
  }

  c.header('X-Cache', 'MISS');
  await next();

  if (c.res.status === 200) {
    const response = c.res.clone();
    const cacheResponse = new Response(response.body, {
      headers: {
        ...Object.fromEntries(response.headers.entries()),
        'Cache-Control': 'public, max-age=60',
      },
    });
    c.executionCtx.waitUntil(cache.put(cacheKey, cacheResponse));
  }
});
Enter fullscreen mode Exit fullscreen mode

Step 5: Request Proxying

// src/handlers/proxy.ts
import { createMiddleware } from 'hono/factory';

export const proxyHandler = createMiddleware(async (c) => {
  const upstreamUrl = new URL(c.req.path.replace('/api', ''), c.env.UPSTREAM_URL);

  const requestUrl = new URL(c.req.url);
  requestUrl.searchParams.forEach((value, key) => {
    upstreamUrl.searchParams.set(key, value);
  });

  const headers = new Headers(c.req.raw.headers);
  headers.delete('Authorization');
  headers.set('X-Forwarded-For', c.req.header('cf-connecting-ip') || '');
  headers.set('X-Request-ID', crypto.randomUUID());

  const upstreamReq = new Request(upstreamUrl.toString(), {
    method: c.req.method,
    headers,
    body: ['GET', 'HEAD'].includes(c.req.method) ? null : c.req.raw.body,
  });

  const startTime = Date.now();
  const response = await fetch(upstreamReq);
  const duration = Date.now() - startTime;

  const responseHeaders = new Headers(response.headers);
  responseHeaders.set('X-Gateway-Duration', `${duration}ms`);
  responseHeaders.set('X-Request-ID', headers.get('X-Request-ID')!);

  return new Response(response.body, {
    status: response.status,
    headers: responseHeaders,
  });
});
Enter fullscreen mode Exit fullscreen mode

Step 6: Structured Logging

// src/middleware/logging.ts
import { createMiddleware } from 'hono/factory';

export const loggingMiddleware = createMiddleware(async (c, next) => {
  const requestId = crypto.randomUUID();
  const startTime = Date.now();
  c.header('X-Request-ID', requestId);

  await next();

  const logEntry = {
    timestamp: new Date().toISOString(),
    requestId,
    method: c.req.method,
    path: c.req.path,
    status: c.res.status,
    duration: Date.now() - startTime,
    ip: c.req.header('cf-connecting-ip'),
    userAgent: c.req.header('user-agent'),
    country: c.req.header('cf-ipcountry'),
    userId: c.get('userId') || null,
    cacheStatus: c.res.headers.get('X-Cache') || 'N/A',
  };

  console.log(JSON.stringify(logEntry));
});
Enter fullscreen mode Exit fullscreen mode

Performance

Component Overhead
JWT verification ~1-2ms
Rate limit (Durable Object) ~5-15ms
Cache hit ~1ms
Cache miss + proxy Upstream latency + ~2ms
Logging (async) 0ms

Total overhead for cached responses: under 5ms. For uncached with rate limiting: 10-20ms.

Production Hardening

app.onError((err, c) => {
  console.error(JSON.stringify({
    error: err.message,
    stack: err.stack,
    path: c.req.path,
  }));
  return c.json({ error: 'Internal gateway error' }, 500);
});

// Request size limit
app.use('/api/*', async (c, next) => {
  const contentLength = parseInt(c.req.header('content-length') || '0');
  if (contentLength > 10 * 1024 * 1024) {
    return c.json({ error: 'Request too large' }, 413);
  }
  await next();
});
Enter fullscreen mode Exit fullscreen mode

Conclusion

Under 300 lines of TypeScript gives you authentication, distributed rate limiting, caching, and structured logging at the edge. Key advantages:

  • Zero cold starts and global distribution across 300+ cities
  • Pay per request ($0.50/million on paid plan)
  • Strongly consistent rate limiting via Durable Objects

Next steps: API key management (KV), request transformation, A/B routing, WebSocket proxying.


If this was helpful, you can support my work at ko-fi.com/nopkt


If this article helped you, consider buying me a coffee on Ko-fi! Follow me for more production backend patterns.

Top comments (0)