Tiers, not models: designing an LLM router on Cloudflare Workers

#ai #llm #cloudflare #api

Every LLM app I've shipped had the same shelf life: pick the best model, hardcode it, and watch it become the second-best model within a month. The fix I keep seeing is a config file full of model strings and a quarterly migration chore. I wanted the abstraction one level up: "how smart does this request need to be?" — so I built a router around performance tiers instead of model names.

The tier contract

Four tiers: Speed / Balance / Intelligence / Reasoning. The API is OpenAI-compatible; model: "tier-2" is the only change a client makes:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.tierup.ai/v1",
    api_key="...",
)

resp = client.chat.completions.create(
    model="tier-2",  # 1=speed, 2=balance, 3=intelligence, 4=reasoning
    messages=[{"role": "user", "content": "..."}],
)

Each tier maps to the current best-value model in its class — that mapping is my problem, versioned server-side, so an upgrade reaches every client with zero code changes on their side.

The stack, concretely

One Cloudflare Worker (Hono) fronts everything: auth (API key or Supabase JWT), a D1 database for users/wallets/request logs, KV for rate limits, and OpenRouter as the upstream aggregator. The Worker validates the request, checks the wallet, rewrites tier-N to the mapped model, proxies (streaming or not), then strips provider/model details from the response so the tier abstraction doesn't leak. Usage and cost are logged per request in D1; billing deducts from a prepaid wallet.

What was genuinely hard

Streaming + billing: you can't know the cost until the last SSE chunk, so billing runs in waitUntil after the stream closes — and you have to trust (and verify) the usage block in the final chunk.
Error compatibility: OpenAI-SDK clients break on nonstandard error bodies; every upstream failure has to be reshaped into the OpenAI error schema.
Health vs function: our /health returned 200 while auth was down (paused upstream DB) and, separately, while completions were broken (a corrupted API-key secret). Reachability lies. We now run a synthetic probe every 6h that signs up a disposable user, logs in, runs a tier-1 completion, and deletes itself — that's the only health check we trust.

The economics (disclosure)

This runs on top of OpenRouter and is priced ~50% under retail while we find out whether tier-routing is a thing people want — a subsidized PMF experiment, stated plainly on the site. Tier 1 is currently free. If you want to poke at it: tierup.ai (playground with no signup at tierup.ai/try, $25 credit, no card). I'm more interested in critique of the tier abstraction than in signups — comments very welcome.

DEV Community

Tiers, not models: designing an LLM router on Cloudflare Workers

The tier contract

The stack, concretely

What was genuinely hard

The economics (disclosure)

Top comments (0)