Every LLM app I've shipped had the same shelf life: pick the best model, hardcode it, and watch it become the second-best model within a month. The fix I keep seeing is a config file full of model strings and a quarterly migration chore. I wanted the abstraction one level up: "how smart does this request need to be?" — so I built a router around performance tiers instead of model names.
The tier contract
Four tiers: Speed / Balance / Intelligence / Reasoning. The API is OpenAI-compatible; model: "tier-2" is the only change a client makes:
from openai import OpenAI
client = OpenAI(
base_url="https://api.tierup.ai/v1",
api_key="...",
)
resp = client.chat.completions.create(
model="tier-2", # 1=speed, 2=balance, 3=intelligence, 4=reasoning
messages=[{"role": "user", "content": "..."}],
)
Each tier maps to the current best-value model in its class — that mapping is my problem, versioned server-side, so an upgrade reaches every client with zero code changes on their side.
The stack, concretely
One Cloudflare Worker (Hono) fronts everything: auth (API key or Supabase JWT), a D1 database for users/wallets/request logs, KV for rate limits, and OpenRouter as the upstream aggregator. The Worker validates the request, checks the wallet, rewrites tier-N to the mapped model, proxies (streaming or not), then strips provider/model details from the response so the tier abstraction doesn't leak. Usage and cost are logged per request in D1; billing deducts from a prepaid wallet.
What was genuinely hard
-
Streaming + billing: you can't know the cost until the last SSE chunk, so billing runs in
waitUntilafter the stream closes — and you have to trust (and verify) the usage block in the final chunk. - Error compatibility: OpenAI-SDK clients break on nonstandard error bodies; every upstream failure has to be reshaped into the OpenAI error schema.
-
Health vs function: our
/healthreturned 200 while auth was down (paused upstream DB) and, separately, while completions were broken (a corrupted API-key secret). Reachability lies. We now run a synthetic probe every 6h that signs up a disposable user, logs in, runs a tier-1 completion, and deletes itself — that's the only health check we trust.
The economics (disclosure)
This runs on top of OpenRouter and is priced ~50% under retail while we find out whether tier-routing is a thing people want — a subsidized PMF experiment, stated plainly on the site. Tier 1 is currently free. If you want to poke at it: tierup.ai (playground with no signup at tierup.ai/try, $25 credit, no card). I'm more interested in critique of the tier abstraction than in signups — comments very welcome.
Top comments (0)