DeepSeek API Pricing 2026: Models, Token Costs, and How to Optimize

DeepSeek built its reputation on one thing: frontier-class performance at a fraction of the price of OpenAI, Anthropic, or Google. In 2026 that is still the story, but the lineup changed. On April 24, 2026, the same day OpenAI shipped GPT-5.5, DeepSeek released V4 and collapsed its entire model range into two API options. If you are budgeting an integration or comparing providers, this guide breaks down DeepSeek API pricing in 2026: the current models, the per-token rates, the discount levers, and how it stacks up against the competition.
Key takeaway:
DeepSeek API pricing in 2026 runs on two models. V4 Flash costs $0.14 per million input tokens and $0.28 output, the cheapest frontier-class API available. V4 Pro lists at $1.74/$3.48 with a standing 75% promotional discount that drops it to roughly $0.435/$0.87. Both support a 1M-token context with no long-context surcharge, and cache hits cost about a tenth of the standard input rate.
The DeepSeek Model Lineup in 2026
DeepSeek V4 arrived in April 2026 and replaced the previous lineup of V3.2, R1, and the legacy API aliases. Two models now cover everything, with V4 Flash offering tiered reasoning modes so you only pay for deep reasoning when you need it.
V4 Flash costs $0.14 per million input tokens and $0.28 per million output tokens, making it the best choice for general tasks and the cheapest frontier-class API option available. V4 Pro is priced at $1.74 per million input tokens and $3.48 per million output tokens and is designed for the most demanding reasoning and agentic workloads. With the standing 75% promotional discount applied, V4 Pro effectively costs approximately $0.435 per million input tokens and $0.87 per million output tokens while providing access to the same model capabilities at a substantially lower rate.
V4 Flash supports a Non-Think mode for routine tasks and Think High or Think Max modes for complex reasoning, so a single model spans cheap, fast answers and heavy reasoning. Both V4 Flash and V4 Pro support a 1M-token context window and up to 384k output tokens.
Legacy model migration
If you still call the old aliases, plan to migrate. The legacy deepseek-chat and deepseek-reasoner aliases are scheduled for retirement on July 24, 2026, after which requests return errors. They currently route to V4 Flash non-thinking and thinking modes. Migration is a one-line change to the model parameter (deepseek-v4-flash or deepseek-v4-pro) on the same base URL and API key. Note that deepseek-reasoner maps to Flash, not Pro.
The Free Tier and What It Includes
DeepSeek's consumer chat is genuinely free. Full model access at chat.deepseek.com and in the mobile app costs nothing for individuals, with web search, file uploads, and saved history included, and no Plus or Pro subscription tier at all. The only catch is fair-use throttling, so during peak hours you may see Server Busy warnings.
For developers, every new API account gets a grant of around 5 million free tokens, valid for roughly 30 days, which is enough to prototype before you pay anything. After that it is pure pay-as-you-go with no minimum spend and no monthly fee. This consumption model is exactly the kind of metered AI spend we cover in our token budgeting framework.
The Discount Levers That Cut Your Bill
DeepSeek is already cheap, but two built-in levers cut the bill much further with little effort.
Prompt caching
DeepSeek automatically caches input chunks of 64 tokens or more. Cache hits cost a fraction of cache misses, often around a tenth of the standard input rate, so keeping a stable system prompt or reference content at the start of every request can cut input costs by 80% or more. No code changes are needed beyond structuring the prompt so the prefix stays identical.
Off-peak pricing
DeepSeek has historically applied automatic off-peak discounts during 16:30 to 00:30 UTC, around 50% off the chat model and up to 75% off the reasoner, with no configuration needed. V4 off-peak pricing had not been formally confirmed at the time of writing, so check the official docs before relying on it, but scheduling non-urgent batch work into that window is worth testing.
Stacking the savings: The levers combine. A workload that pins its system prompt for cache hits, routes routine calls to V4 Flash Non-Think mode, and schedules batch jobs into the off-peak window can run at a small fraction of even DeepSeek's already-low list price. The discipline is the same as any token workload: cache hard, route by difficulty, and time-shift what you can.
DeepSeek vs GPT, Claude, and Gemini
DeepSeek's position is simple: frontier-class reasoning at the lowest cost. The comparison below uses representative 2026 rates per million tokens.
DeepSeek V4 Flash costs $0.14 per million input tokens and $0.28 per million output tokens, making it the cheapest frontier-class option. GPT-5 from OpenAI is priced at $1.25 per million input tokens and $10.00 per million output tokens and serves as the flagship model for general-purpose and reasoning workloads. Claude Sonnet 4.6 costs $3.00 per million input tokens and $15.00 per million output tokens, positioning it as a balanced production workhorse. Gemini 2.5 Flash-Lite starts at $0.10 per million input tokens with low output pricing, offering a cheaper input rate but representing a smaller and less capable model overall.
Against the major labs, DeepSeek V4 Flash undercuts the frontier tier by an order of magnitude on output while scoring competitively on coding and reasoning benchmarks. For the full picture on the alternatives, see our ChatGPT pricing in 2026, our Claude AI 2026 guide, and our Google Gemini API pricing guides.
Running DeepSeek Through Third-Party Providers
You can also reach DeepSeek through aggregators and clouds. OpenRouter matches DeepSeek's direct rates for V4 models and adds a free tier for distilled variants. AWS Bedrock and Azure AI Foundry charge a premium but solve data-residency concerns by routing through US and EU infrastructure, which matters for teams that cannot send data to China. Together AI and Fireworks offer competitive rates on Flash-class models but charge more for reasoning models.
For sustained production volume, the direct API generally provides the best cost floor, especially once off-peak and caching are in play. If data residency is your constraint, the hosted routes are worth the premium, the same tradeoff we discuss in our Amazon Bedrock pricing guide.
How to Control DeepSeek API Costs
Route by difficulty. Use V4 Flash for general tasks and Non-Think mode for routine calls; reserve Think modes and V4 Pro for genuinely hard reasoning.
Pin prompts for cache hits. Keep system prompts and reference content identical and at the start of each request so the automatic cache fires.
Time-shift batch work. Schedule non-urgent jobs into the off-peak window where discounts apply, and confirm current off-peak terms in the docs.
Set max output limits and attribute spend. Cap response length and tag calls by team and feature so you can see cost per outcome, as covered in our LLM cost optimization guide.
Conclusion
DeepSeek API pricing in 2026 remains the value benchmark the rest of the market is measured against. Two models cover the range: V4 Flash at $0.14/$0.28 for the cheapest frontier-class inference available, and V4 Pro for the hardest work, with a standing promotion that keeps it inexpensive. Add automatic caching, off-peak discounts, and a 1M-token context with no surcharge, and the effective cost drops well below even the headline rates. Migrate off the legacy aliases before July 24, route by difficulty, cache hard, and time-shift batch jobs. If you want help attributing and controlling AI and cloud spend across providers, that is exactly the discipline Opslyft brings.

DEV Community

DeepSeek API Pricing 2026: Models, Token Costs, and How to Optimize

Top comments (0)