DeepSeek V4 pricing is split across two API models: deepseek-v4-pro and deepseek-v4-flash.
The official pricing page lists separate rates for cache-hit input, cache-miss input, and output tokens. That matters because repeated system prompts, reused context, and stable templates can make cache-hit pricing materially cheaper.
Think of Flash and Pro as two pricing lanes: Flash handles volume, while Pro is reserved for prompts where failure cost is higher.
Official API prices
| Model | Cache-hit input | Cache-miss input | Output |
|---|---|---|---|
| DeepSeek V4 Flash | $0.028 / 1M tokens | $0.14 / 1M tokens | $0.28 / 1M tokens |
| DeepSeek V4 Pro | $0.145 / 1M tokens | $1.74 / 1M tokens | $3.48 / 1M tokens |
Source: DeepSeek API pricing.
How to choose
Use DeepSeek V4 Flash when the workload is high-volume: chat, summaries, extraction, classification, routing, and first-pass analysis.
Use DeepSeek V4 Pro when the task has a higher failure cost: difficult code repair, long reasoning, advanced math, agent planning, or final answer synthesis after cheaper models have prepared context.
Credit mapping on this site
This site uses a simple credit layer above the official API:
- Flash chat: 1 credit
- Pro chat: 4 credits
- Thinking: +1 credit
- Web search: +2 credits
This is not DeepSeek's official billing model. It is a product-level abstraction so users can compare Flash, Pro, Thinking, and web search in one interface.
Practical cost advice
Keep reusable instructions stable so prompt caching can work. Route cheap, repetitive prompts to Flash. Escalate to Pro only when the answer needs the stronger reasoning ceiling.
Source article: Read the original post
Homepage: Visit the site
Model pages:

Top comments (0)