So here's the thing about the deepseek v4-pro pricing schedule that has been making the rounds on hn this week (thread 48002136, 494 pts / 198 comments) — the 75% promotional discount everyone has been routing to is calendar-deterministic. it expires 2026-05-31 at 15:59 UTC. that's 27 days from today. on june 1 the same agent that costs you $87 in tokens will cost you $348.
most teams i've talked to about this have a vague awareness ("yeah we've been saving with deepseek") and zero plan for the cliff. nobody has a runbook for what happens when a long-running agent session starts before the price flip and finishes after it. nobody has tested whether their billing dashboard refreshes mid-session or only on the next invoice. and nobody has checked if their fallback provider is actually configured to take over at 16:00 UTC on the 31st.
this isn't a "scandal" the way openclaw or hermes.md or the copilot 27x pivot were. deepseek published the schedule openly. but the failure mode is identical to a stripe price-update event delivered with no buyer-side notification: you've been routing 60-80% of agent traffic to v4-pro since the 04-26 launch, the meter ticks up at 4x without an alert, and the line on june's bill is the first thing you see.
what's actually happening
the published schedule (api-docs.deepseek.com/quick_start/pricing) is unambiguous:
| line item | promo (now → may 31 15:59 UTC) | post-cliff (june 1 onward) | multiple |
|---|---|---|---|
| output tokens | $0.87 / 1M | $3.48 / 1M | 4x |
| input cache miss | $0.435 / 1M | $1.74 / 1M | 4x |
| input cache hit | $0.03625 / 1M | $0.145 / 1M | 4x |
every line is exactly 4x. the cache-hit tier is the one that catches people — at $0.036/M it's basically free and most teams stopped instrumenting after the first week. at $0.145/M it's still cheap but cache-hit volume is usually 5-10x cache-miss volume, so the absolute dollar delta on cache-hit can dwarf the headline output-token delta.
the deepclaw thread (claude code → deepseek wrapper, hit hn front page same week) is full of "$0.06 per task with v4-pro" anecdotes. every one of those becomes $0.24 on june 1. the $30/mo dev-side budget becomes $120. the $4k/mo agent fleet becomes $16k. nothing on the deepseek side of the wire changes — same model, same endpoint, same response — just a 4x multiplier on the meter.
what we learned doing the math on our own usage
i ran our last 30 days of llm spend through the cliff scenario assuming we keep the same routing distribution. three observations worth sharing:
1. cache-hit dominance flips the headline number. our v4-pro mix is ~70% cache-hit / 25% cache-miss / 5% output-heavy. on the promo schedule cache-hit is ~6% of total spend; post-cliff it's ~22%. the "output tokens went 4x" story misses where the dollars actually sit.
2. mid-session pricing flips are a real failure mode. we have agent runs that span 4-6 hours. on may 31 a session that starts at 12:00 UTC and finishes at 18:00 UTC will see the promo rate for 3h59m of tokens and the post-cliff rate for the rest. nothing in the deepseek api response surfaces which tier the request was billed at — you find out from the invoice. for a long-running summarization or repo-analysis job that's a $0.40 → $1.60 swing on a single run, but the agent has no way to detect it and slow down.
3. multi-provider failover is necessary but not sufficient. the obvious move is "fail over to anthropic / openai if deepseek crosses some price threshold." but the threshold isn't price (deepseek hasn't changed prices, the schedule has) — it's calendar. you have to encode "after 2026-05-31 15:59 UTC, if i'm still routing to v4-pro, stop" in the routing layer itself. and you have to test it before the day, because if your e2e test environment hits real apis on june 1 you've lost the dry run.
practical implications (things you can do this week)
these are the actions, in priority order, that close the cliff exposure for a typical team running 60-80% deepseek-v4-pro traffic:
audit your last 30 days of v4-pro spend by tier. group by cache-hit / cache-miss / output. multiply each line by 4x and look at the dollar delta, not the percent. the result is your post-cliff monthly run-rate floor. if it scares you, the rest of this list is mandatory; if it doesn't, you can stop here.
add a hard cutoff in your router by may 30, not may 31. "switch off v4-pro at 15:59 UTC on the 31st" is the obvious answer and the wrong one — it puts a billing event on the same calendar minute as a routing event with no headroom. switch off at midnight UTC may 30, run on the post-cliff equivalent (deepseek-v4 base, anthropic haiku, gpt-4.1-mini) for 24-48 hours, validate that quality + latency hold, then make a deliberate "do we want v4-pro at the new price" call with the actual numbers in front of you.
instrument cache-hit-rate dollar impact, not just request count. most observability stacks count cache hits as a vanity metric. on june 1 cache-hit becomes 4x its promo price and the volume doesn't change. plot dollar-cost-per-cache-hit over time so the cliff shows up as a step function and not a slow drift.
kill any long-running session that starts after may 31 12:00 UTC. for the 4-hour window before the cliff, force agent runs to stay under 30 minutes wall-clock. it's annoying for one shift but it eliminates the mid-session-pricing-flip class of bug entirely. cheaper than discovering on june 2 that your overnight run got billed at the post-cliff rate for 90% of its tokens.
set an out-of-band budget cap that survives provider misbehavior. the cliff is the deterministic case. the worse case is a provider you've never used before issuing a similar schedule on a tuesday with no warning. the only durable defense is a hard $/period ceiling on a layer the provider doesn't control. that's the slot llmeter sits in — count tokens before they leave your network, refuse to forward requests once the cap is hit, fail closed when the meter is uncertain. fwiw it's the same architecture every team rolls themselves on the 2nd of the month after the first surprise bill.
tldr
Deepseek did the honest thing and published the schedule. the failure mode is still that 99% of teams won't read it until june 1. you have 27 days. the cheap version of preparing is item 1 (run the numbers); the durable version is item 5 (own the cap). the rest is somewhere in between.
if you're running llm spend across deepseek + anthropic + openai and want a dashboard that flags the pre-cliff routing exposure today rather than after the invoice, that's what i've been building at https://llmeter.org/?utm_source=devto&utm_medium=article&utm_campaign=hn-week-deepseek-may-31-cliff. open-source, agpl-3.0, no proxy in the request path. drop the sdk in, point it at your providers, see the cliff exposure as a number.
happy to compare notes if you've already done the audit on your stack — what was the cache-hit delta vs the headline?
Top comments (0)