Disclosure: I maintain Lynkr, the self-hosted gateway discussed in the second half. OpenRouter and Requesty are good products — this post is about understanding what you're paying for so you can decide whether you need to.
Hosted LLM routers had a huge 2026 — OpenRouter alone pushes 25 trillion tokens a week. The pitch is real: one API key, 400+ models, automatic failover. The price is a ~5% fee on every token you route (5.5% on OpenRouter credits, 5% on Requesty), plus a subtler cost: every prompt, every file your coding agent reads, every secret that leaks into a context window transits their infrastructure.
For a hobby project, 5% of a small bill is nothing and the convenience wins. For an agentic coding workload — where teams routinely spend $500–$2,000 per engineer per month — 5% is real money, and the data-transit question stops being academic. So it's worth asking precisely: what does the hosted router actually do for that fee, and which parts can you self-host?
What the fee buys
- Unified API across providers — one format in, translated per-provider out.
- Failover — a provider 500s, your request retries elsewhere.
- Model marketplace — new models available the day they launch.
- Consolidated billing — one invoice instead of six provider accounts.
-
(Sometimes) smart routing — OpenRouter's
autorouter picks a model per-request.
Items 1, 2, and 5 are software. Items 3 and 4 are genuinely hard to self-host — if you want day-one access to every new model with zero account setup, the marketplace earns its fee. But most coding workloads use a handful of models, not four hundred.
The parts a hosted router structurally can't give you
- Local models as a tier. No hosted router will route your easy requests to the Ollama instance on your own machine — free, private, zero latency to first byte on cached weights. For coding traffic, where (in my instrumented sessions) 70–90% of requests are simple enough for a good local model, this is the single biggest cost lever, and it's only available to something running on your side of the wire.
- Your data staying home. Self-hosted means prompts, code, and keys never transit a third party. For anyone with a compliance requirement — or code they'd rather not ship to a router's logs — this isn't a preference, it's a prerequisite.
- Token optimization before the bill. A hosted router bills you for the tokens you send it — it has no incentive to shrink them. A self-hosted proxy can strip unusable tool schemas (measured: −53% on tool-heavy requests) and compress JSON tool results (measured: 3,458 → 427 tokens on a grep result) before any provider bills you. That's not a routing saving; it stacks on top of routing.
- No availability dependency. Hosted routers go down (OpenRouter's outages have their own HN threads) and offer no SLA at consumer tiers. A local proxy fails independently of anyone's status page.
What self-hosting costs you
Honesty cuts both ways:
-
You run a process.
npm install -g lynkr && lynkr init && lynkr start— but it's yours now: updates, logs, the works. - You manage provider accounts. Two or three API keys instead of one. The consolidated invoice is genuinely gone.
- Model lag. A new provider means waiting for support (or a PR) instead of it appearing in a dropdown.
- Nobody to email. Self-hosted support is a GitHub issue tracker.
If those trade-offs read as "fine," the math is straightforward: the 5% fee disappears, the local-tier routing removes the easy majority of requests from your bill entirely, and compression shrinks what's left.
The hybrid that actually makes sense
This isn't either/or. A pattern I see working:
Coding tool → self-hosted proxy (Lynkr)
├─ SIMPLE/MEDIUM → local Ollama/llama.cpp (free)
├─ COMPLEX → direct provider API keys (no fee)
└─ exotic models → OpenRouter (5% on the long tail only)
Keep a hosted router as one backend for the long tail of models you rarely need, route the bulk directly or locally, and let the proxy's classifier decide per-request. You get the marketplace when you want it without paying the tax on your entire volume.
Lynkr is Apache-2.0, self-hosted, supports 13 providers including Ollama, llama.cpp, LM Studio, Bedrock, Azure, Databricks — and OpenRouter itself as a tier: github.com/Fast-Editor/Lynkr. Benchmarks with methodology are in the repo; run them on your own workload before believing anyone's percentages, including mine.
Top comments (0)