If you're running AI agents — Claude Code, Cursor, Aider, LangChain chains, or any multi-step workflow — you're burning through tokens fast. A single coding agent session can consume thousands of tokens per task, and those costs compound quickly.
Many developers route these calls through OpenRouter for convenience. But you're paying more than the model provider's list price — even though OpenRouter markets itself as "pass-through pricing."
This article breaks down what that actually costs, when an aggregator is worth the overhead, and when you should go direct.
What OpenRouter Actually Charges
According to OpenRouter's pricing page, the Pay-as-you-go tier applies a 5.5% platform fee on top of model pricing. This is separate from the per-token cost of the models themselves.
Here's what the fee looks like at different spend levels:
| Monthly Spend | 5.5% Fee | Effective Rate |
|---|---|---|
| $100 | $5.50 | 5.5% |
| $500 | $27.50 | 5.5% |
| $1,000 | $55 | 5.5% |
| $5,000 | $275 | 5.5% |
The fee is flat at 5.5% regardless of volume. There's no discount tier for heavy usage. If you're spending $1,000/month on API calls, $55 goes to OpenRouter before a single token is generated.
When the Fee Is Worth It
An aggregator fee makes sense when it saves you more than it costs. Here are legitimate reasons to pay it:
1. Multi-model routing without managing multiple API keys. If your app switches between Claude, GPT, and Gemini based on task type, OpenRouter gives you one endpoint for all of them. Managing three separate provider accounts, billing dashboards, and API keys has real operational cost.
2. Model fallback and failover. If Claude is down, your request can automatically route to GPT. Building this yourself requires health checks, retry logic, and provider-specific error handling.
3. Prototyping and model comparison. Testing which model works best for your use case is faster with a single API that lets you swap model parameters. Once you've picked your model, you can go direct.
When the Fee Doesn't Make Sense
1. Single-model workloads. If 90%+ of your calls go to one model, you're paying 5.5% for routing you don't use. Go direct with that provider.
2. High-volume agent workloads. If you're running coding agents, browser agents, or multi-step workflows, token consumption adds up fast. At $5,000/month, you're paying $275/month in platform fees. At that volume, most providers offer direct enterprise pricing that's cheaper than list price.
3. Latency-sensitive applications. Any routing layer adds overhead per request. For interactive applications where every millisecond matters, going direct to the provider eliminates this extra hop.
The Alternative: Aggregators That Discount Instead of Mark Up
Disclosure: we run FuturMix, so treat this section as an example rather than a neutral ranking.
The aggregator model doesn't have to mean "provider price + platform fee." Some aggregators negotiate volume rates with providers and pass the savings to users, so the effective price is below going direct — closer to a pass-through discount model than a routing tax.
For example, FuturMix uses this approach: Claude is 10% off the Anthropic rate, GPT is 30% off OpenAI's rate, Gemini is 20% off Google's rate, with 20+ models and no platform fee on top. Top up from $10, balance does not expire. The API is OpenAI-compatible, so migration is a base URL swap — works with any tool that supports custom endpoints (Claude Code, Cursor, Aider, LangChain, etc.):
# Before (OpenRouter)
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-..."
)
# After (FuturMix)
client = OpenAI(
base_url="https://futurmix.ai/v1",
api_key="sk-..."
)
The tradeoff: FuturMix has fewer models than OpenRouter (20+ vs 200+) and a smaller community. But if you're using mainstream models (Claude, GPT, Gemini), the pricing math favors a discount-model aggregator over a fee-model one.
Decision Framework
| Scenario | Best Option |
|---|---|
| Using 1 model, high volume | Go direct with provider |
| Running agents (Claude Code, Aider, etc.) | Discount aggregator or direct — minimize per-token overhead |
| Using 2-3 models, want lower cost | Discount aggregator (no platform fee) |
| Testing many models, prototyping | OpenRouter (breadth matters more than cost) |
| Need 200+ models including open-source | OpenRouter (largest catalog) |
| Production, need SLA guarantee | Go direct or use aggregator with published SLA |
The Real Question
The question isn't "should I use an aggregator?" — it's "does this aggregator save me more than it costs?"
If you're paying 5.5% for routing convenience you barely use, you're overpaying. If you're paying 5.5% because the multi-model access genuinely saves you engineering time, it's a reasonable trade.
But if an aggregator can offer lower prices than going direct — no fee, just a discount — that changes the math entirely.
We run FuturMix, an OpenAI-compatible API with 20+ models at 10-30% below provider pricing. Top up from $10 when you're ready, balance does not expire. This article reflects our honest view of when aggregators (including ours) make sense and when they don't.
Top comments (0)