Everyone argues about which AI model is "best" using benchmarks. I wanted to know something simpler: which models do people actually run in production, when they pay per token and can pick anything?
So I started tracking OpenRouter's published usage rankings every day. OpenRouter is a neutral marketplace: developers route requests to whatever model they want and pay by the token, so the rankings are a real, money-on-the-line signal of demand. Here is what the last 7 days look like, by token volume routed per model.
The current top 10 by real usage
| Rank | Model | Provider | Tokens/wk | Price (in/out per MTok) |
|---|---|---|---|---|
| 1 | DeepSeek V4 Flash | DeepSeek | 4.72T | $0.09 / $0.18 |
| 2 | MiMo-V2.5 | Xiaomi | 4.38T | $0.11 / $0.28 |
| 3 | MiniMax M3 | MiniMax | 3.68T | $0.30 / $1.20 |
| 4 | Owl Alpha | OpenRouter (free) | 3.55T | Free |
| 5 | Hy3 preview | Tencent | 3.46T | $0.06 / $0.21 |
| 6 | Claude Opus 4.7 | Anthropic | 2.23T | $5 / $25 |
| 7 | GLM 5.2 | Z.ai | 2.20T | $0.94 / $3 |
| 8 | DeepSeek V4 Pro | DeepSeek | 2.08T | $0.44 / $0.87 |
| 9 | Claude Opus 4.8 | Anthropic | 1.92T | $5 / $25 |
| 10 | Claude Sonnet 4.6 | Anthropic | 1.54T | $3 / $15 |
The entire top 5 is a Chinese lab or open-weight. The first OpenAI model, GPT-5.5, does not show up until #12. The first Gemini is #13.
The honest caveat
Before anyone (rightly) objects: this is OpenRouter only. It measures the open API and router market, where developers pick a model per call. It does NOT include first-party traffic like ChatGPT, the Gemini app, or claude.ai, so the big consumer flagships are heavily undercounted in this view.
So this is not "OpenAI is losing." It is something more specific, and to me more interesting: when developers route through a neutral marketplace and pay their own token bill, they overwhelmingly reach for cheap open and Chinese models.
Why: the price gap is brutal
Look at the two ends of the table. DeepSeek V4 Flash is $0.09 / $0.18 per million tokens. Claude Opus is $5 / $25. That is roughly a 50x difference on output.
For a chatbot you babysit, quality wins and you happily pay for Opus. But for agent loops, batch pipelines, RAG over big corpora, and anything that burns tokens at scale, a 50x price gap is the whole decision. The open and Chinese models are now good enough for those workloads and an order of magnitude cheaper. That combination is what the usage chart is showing.
What this actually means
- If you are building anything token-heavy, you are probably leaving real money on the table by defaulting to a US flagship.
- The "open models are toys" era is over, at least for production API workloads.
- The gap between benchmark leaderboards and real usage is large. The model that tops your favorite eval can have near-zero real adoption, and the reverse is also true.
I update this leaderboard daily here if you want to watch it move: https://whatstrending.ai/models
What are you actually routing for bulk or agent work right now? DeepSeek, GLM, MiniMax, or something you self-host? Curious how this matches what people see in production.
Top comments (0)