I'm building a custom coding agent harness that uses DeepSeek models through OpenRouter. The models offer a good price/performance balance (especially with the current Pro discount).
While editing the config, I accidentally removed the provider preference. My OpenRouter workspace had only two providers enabled: the Official and ProviderA. And, of course, OpenRouter picked ProviderA! My harness displays session cost in real time, and within about 30 minutes the session cost jumped up enough to catch my attention. I pulled the usage breakdown:
| Model | Provider | Calls | Prompt tokens | Cached tokens | Cache % | Completion tokens | Cost |
|---|---|---|---|---|---|---|---|
| flash | ProviderA | 115 | 3,150,998 | 2,790,656 | 88.6% | 66,176 | $0.1471 |
| flash | Official | 718 | 28,198,126 | 27,187,072 | 96.4% | 378,485 | $0.3236 |
| pro | ProviderA | 11 | 306,246 | 93,952 | 30.7% | 8,796 | $0.3994 |
| pro | Official | 1341 | 54,105,295 | 51,849,600 | 95.8% | 737,680 | $1.8110 |
Normalized per 1M total tokens:
- Flash: ProviderA $0.046, Official $0.011 -> 4x more expensive
- Pro: ProviderA $1.27, Official $0.033 -> 38x more expensive
Why a "small" cache gap hurts so much
Prompt caching makes uncached input tokens an order of magnitude pricier (two orders in case of Official provider). Even the 8-point cache difference for Flash (88.6% vs 96.4%) means ProviderA processed 3x more uncached tokens. Combine that with a higher base price and you get the 4x multiplier. For Pro the cache gap is extreme (30.7% vs 95.8%) -- that's the core of the 38x blow-up.
Currently, official pricing for DeepSeek V4 Pro cached tokens is just $0.0036/M (yes, that's right, decimal point is in the right place!). For agentic workloads it massively drives cost down.
The fix: explicitly set provider preference
Always include the provider object in your HTTP request:
{
"model": "deepseek/deepseek-v4-pro",
"messages": [...],
"provider": {
"order": ["deepseek"],
"allow_fallbacks": false
}
}
Or at least create separate workspace/key guardrails on the OpenRouter for each model family.
Did something like that ever happen to you?
Top comments (0)