欧阳石景

Posted on Jun 15

The 5.5% Tax of OpenRouter — and Why I Built an Alternative

#ai #llm #openai #opensource

9 of the world's top 10 open-source LLMs are now Chinese. After GLM-5.2 landed, the
only non-Chinese model still in the top 10 is Llama. If your gateway taxes every call
by 5.5%, you are paying that tax to route to models you could reach for free.

The 5.5% you didn't notice you were paying

OpenRouter's pitch is fair: one key, 400+ models, transparent pricing.

But read the billing page closely. Every credit-card top-up adds 5.5% (minimum $0.80). Crypto top-ups add 5%. Token prices look "at cost" — until you check community benchmarks and notice DeepSeek-R1 routinely runs ~15% above what direct providers charge, depending on which underlying provider OpenRouter routes to that hour.

For a hobby project, this is invisible. For anyone running a real workload, the math gets uncomfortable fast.

$10,000 / month on inference  ->  ~$550 / month in routing tax
$100,000 / month                ->  ~$5,500 / month
$1,000,000 / month              ->  ~$55,000 / month

That's not a rounding error. That's an engineer's salary. And you're paying it to do something you could do with a Cloudflare Worker and ten lines of Go.

I built haotokai as an OpenRouter alternative because I'm one of the people who ran that math, and the answer was: this needs a smaller, cheaper, narrower tool.

Reddit has been complaining for a year

This isn't me being clever. The biggest LLM-infra subreddits have been writing the same comment in different words for twelve months:

"Great for trying out niche models, but the 5.5% fee stings at scale."
— r/LocalLLaMA, on OpenRouter pricing

"I use OpenRouter for experimentation, then move to direct API or a BYOK router for production."
— r/LocalLLaMA, in a thread about cost optimization for indie hackers

"It's not actually routing — you still pick the model yourself."
— r/OpenAI, recurring complaint about the "Auto Router" framing

Those quotes are not cherry-picked outliers. Search "OpenRouter fee" on r/LocalLLaMA or the OpenRouter Discord and you will find the same thread, monthly, since the start of 2025. The product is good. The tax is real. Both things are true.

There's a second, quieter complaint that matters more for production:

"Provider returned an error from OpenRouter does not trigger model failover."
— OpenRouter GitHub Issue #45663

"Rate limit errors surfaced to user instead of auto-failover."
— OpenRouter GitHub Issue #50389

So you're paying 5.5% for routing — but a lot of users are reporting the routing doesn't fail over the way they expected. That's the gap I started building into.

What the 5.5% is actually paying for

To be fair to Alex and the OpenRouter team, that 5.5% covers some real things: card processing, fraud risk, chargebacks, frontend, dashboards, the leaderboard, Auto Router, and a Discord with 50K+ members.

But it also covers 400 models I will never call, a marketplace I don't need, and a level of "everything to everyone" that I, personally, am not the customer for.

I'm an indie hacker. I call four models, ever:

DeepSeek-V4 / DeepSeek-R1 for code and reasoning
Kimi K2 for long-context document work (2M context window, no joke)
Qwen3 for multilingual tasks
GLM-4.6 as a backup reasoning model

That's it. Three Chinese open-source families plus GLM. If you look at the mid-2026 leaderboards, that's 9 of the global top 10. The fact that they're all built outside the U.S. is, technically, irrelevant — they sit at the top of the same evals everyone else ranks on.

So the question I asked myself, in October 2025, was: what does an OpenRouter alternative look like if it only has to do those four families well?

haotokai, in one sentence

Here's the positioning I landed on, and the line I keep on the homepage:

The cheapest way to access DeepSeek, Kimi K2 and Qwen from outside China — one base URL, OpenAI-compatible, no markup, no subscription.

That's it. No 400 models. No leaderboard. No Auto Router. No subscription tier. No 5.5% card fee. Pay-as-you-go from $1.

The trade is honest: you give up breadth, you get pass-through pricing on the four model families that, frankly, do most indie work in 2026.

If you want OpenAI, Claude, and Gemini in the same key, OpenRouter is still the right tool. If you want a DeepSeek API proxy that doesn't tax you for the privilege, an OpenRouter alternative is what you want — and you can probably guess where I think you should look.

One line of code to switch

The migration is the same as moving between any two OpenAI-compatible providers. You change the base URL. That's the entire story.

  from openai import OpenAI

  client = OpenAI(
-     api_key="sk-or-v1-xxxxxxxx",
-     base_url="https://openrouter.ai/api/v1",
+     api_key="sk-haotokai-xxxxxxxx",
+     base_url="https://api.haotokai.com/v1",
  )

Streaming works. Tool calling works. JSON mode works. Same Chat Completions schema, same error envelope. If your code worked against OpenRouter, it will work against haotokai with one diff line.

Verifying it with curl

Before wiring it into your stack, hit it with curl. This is the test I run on every gateway I evaluate:

curl https://api.haotokai.com/v1/chat/completions \
  -H "Authorization: Bearer $HAOTOKAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-reasoner",
    "messages": [
      {"role": "user", "content": "In one sentence: why does a 5.5% routing fee compound badly at scale?"}
    ],
    "max_tokens": 120
  }'

If you get a 200 with a normal choices[0].message.content, you're done. The same curl, against the same path, with the same JSON, works for kimi-k2, qwen3-max, and glm-4.6 — only the model string changes. That's the whole UX promise of OpenAI-compatible gateways, and it's why Kimi K2 API international access shouldn't require relearning anything.

A real Python demo: 4 models, one cost report

The thing I actually use this for, daily, is comparing models on a fixed prompt. Here's the script — it asks the same question to four Chinese open-source frontier models in parallel, then prints the answer plus the per-call cost, so you can see what a true Qwen API outside China call actually costs versus the OpenRouter quote.

import os
from concurrent.futures import ThreadPoolExecutor
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["HAOTOKAI_API_KEY"],
    base_url="https://api.haotokai.com/v1",
)

PROMPT = "In 50 words: when is a 5.5% routing fee actually worth paying?"

# Pass-through prices on haotokai (USD per 1M tokens, input / output).
PRICES = {
    "deepseek-reasoner": (0.55, 2.19),
    "kimi-k2":           (0.27, 1.10),
    "qwen3-max":         (0.30, 1.20),
    "glm-4.6":           (0.50, 1.50),
}

def ask(model: str):
    resp = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": PROMPT}],
        max_tokens=120,
    )
    in_tok  = resp.usage.prompt_tokens
    out_tok = resp.usage.completion_tokens
    p_in, p_out = PRICES[model]
    cost = (in_tok * p_in + out_tok * p_out) / 1_000_000
    return model, cost, resp.choices[0].message.content.strip()

with ThreadPoolExecutor(max_workers=4) as ex:
    for model, cost, answer in ex.map(ask, PRICES):
        print(f"\n[{model}]  ${cost:.6f}")
        print(answer)

Run that on a Tuesday afternoon and the total bill comes in under a tenth of a cent. Run the same script through OpenRouter and add 5.5% on top of token cost plus the gateway's per-provider markup. On a single dev session that gap is invisible. On a CI pipeline that grades 50,000 candidate prompts a week, it isn't.

Why I think the OpenRouter alternative space is real

Three reasons, in order of how much they matter to me:

The model market split is permanent. 9 of the open-source top 10 are Chinese in 2026. That isn't going to flip in 2027. Any global stack needs a clean, OpenAI-shaped path to those models, and OpenRouter is one of the few tools that ships that path today — it just charges 5.5% for it.
Indie economics are different from enterprise economics. When you're a one-person team shipping a product, you don't need 400 models, you need the four you actually use, at pass-through prices, with a $1 minimum top-up. A cheap Claude API for indie hackers is a different product from an enterprise gateway with SOC 2 and SSO — both should exist.
OpenAI compatibility commoditizes the gateway. If everyone speaks the same Chat Completions schema, switching gateways is a one-line diff. That's good for users and bad for any gateway that thinks it owns its customers. The right response, as a gateway, is to charge less and stay narrow — not charge more and add features people didn't ask for.

That's the bet behind haotokai. It is the OpenRouter alternative for people who only care about DeepSeek, Kimi K2, Qwen, and GLM, and who would rather not pay a 5.5% tax to get there. It's also, today, the cleanest OpenRouter alternative DeepSeek users have for pass-through pricing on R1 specifically — DeepSeek-R1 is on the list at $0.55 / 1M input tokens, the published direct rate, with no card surcharge layered on top.

Try it

If any of the above resonated, the offer is small and concrete: sign up at haotokai.com, top up nothing, and you get $1 in free trial credit — enough to hit DeepSeek-R1, Kimi K2, Qwen3, and GLM-4.6 a few hundred times each from the same key. No card. PayPal works. The base URL is https://api.haotokai.com/v1, the SDK is whatever you already use, and the migration from OpenRouter is the diff above.

If you stick around after the free credit, great. If you don't, you've at least seen what the 5.5% was actually buying you — and you'll never look at an aggregator's billing page the same way again.

That alone, I'd argue, is worth ten minutes.

Top comments (1)

HuiXia-Meshs • Jul 2

Good breakdown. One dimension that's often missing from these comparisons: model coverage breadth.

A gateway that's cheaper but only covers 4 models forces a tradeoff — you save on routing but still need separate API keys for everything else. Most teams in production end up needing at least one Western model (Claude/GPT) alongside Chinese ones for different parts of their pipeline.

The market seems to be splitting into narrow pass-through (cheap, limited coverage) vs. broad gateways (30+ models under one key). Worth factoring into any comparison beyond raw $/M token rates.