Max Quimby

Posted on May 5 • Originally published at agentconn.com

DeepClaude vs Claude Code vs Codex Pro: 2026 Cost Stack

#ai #aiagents #claudecode #deepseek

This morning, DeepClaude — a four-line shim that points Claude Code at DeepSeek V4 Pro — became the #1 story on Hacker News with 606 points and 257 comments. The repo's tagline is "same UX, 17× cheaper." The agent-loop-vs-model decoupling is now sitting on the front page of the developer internet, and it can be enabled with three export statements.

📖 Read the full version with charts and embedded sources on AgentConn →

Pair this with two other moves over the last 30 days — Codex Pro's $200/mo plan now offering 20× the rate limits of ChatGPT Plus, and Anthropic's continued lock on the "Best Coding AI" Polymarket at 90%+ — and the question coding-agent buyers are now asking is no longer which model. It's which substrate.

This piece compares three: DeepClaude (Claude Code on DeepSeek V4 Pro), Claude Code on Anthropic's API, Codex Pro on GPT-5.4.

The Three Substrates

DeepClaude

Claude Code's harness, swapped to talk to DeepSeek's Anthropic-compatible endpoint. ~50-line Node proxy plus four export statements. Every feature of the Claude Code harness — sub-agents, MCP, /resume, hooks, IDE plugins — running on V4 Pro at $0.27/M input, $1.10/M output, $0.014/M on cache hits. (DeepSeek pricing.)

Claude Code (Anthropic API)

Same harness, model calls go to api.anthropic.com. Sonnet 4.6: $3/M / $15/M. Opus 4.7: $15/M / $75/M. Polymarket has Anthropic at 98.6% to win April's "Best Coding AI".

Codex Pro (GPT-5.4)

OpenAI's substrate. Codex CLI defaults to GPT-5.4. $200/mo for 20× ChatGPT Plus limits with token-based metering. Cloud-orchestrated long-running agents, not local pair-programming.

The Cost Stack

Workload	DeepClaude	Sonnet 4.6	Opus 4.7	Codex Pro
Light (5/day)	$0.18	$1.50	$7.25	$200/mo flat
Medium (15/day)	$0.55	$4.50	$22	$200/mo flat
Heavy (40/day)	$1.50–3	$20–40	$100–200	~$30 API
8h agent loop	$4–8	$80–150	$400–700	hits 20× cap

DeepClaude is 10–25× cheaper across every workload. The cache-hit price ($0.014/M) does most of the work — agent loops re-send system prompts every turn, so after the first call you spend almost all your input budget at cache rates.

The Quality Stack

Benchmark	DeepClaude	Sonnet 4.6	Opus 4.7	Codex
SWE-bench Verified	80.6	76.8	80.8	79.1
SWE-bench Pro	55.4	58.1	64.3	60.2
Terminal-Bench 2.0	67.9	65.4	71.2	77.3
LiveCodeBench	93.5	88.8	91.4	90.6

(Sources: buildfastwithai, benchlm.ai, Builder.io.)

80% of normal feature work: DeepClaude is indistinguishable from Sonnet 4.6, within a hair of Opus 4.7.
Multi-file architectural reasoning: Opus 4.7 still has a real edge.
Terminal/CLI workflows: Codex Pro leads at 77.3% Terminal-Bench.
Computer Use: Claude is most mature; DeepSeek's tool-call recovery is noticeably worse.

The Lock-In Stack

DeepClaude: lowest. Migrating off is also four lines. Risk: DeepSeek can reprice the Anthropic-compatible endpoint anytime.
Claude Code: medium. Anthropic owns harness + model; product surface is mature.
Codex Pro: highest. OpenAI owns harness, model, billing model, and Apps SDK distribution.

The Polymarket Contrarian Read

Polymarket prices Anthropic at 98.6% on "Best Coding AI." But the question developers actually answer with their wallets isn't "which model is best on SWE-bench" — it's "which substrate gives the best quality-per-dollar-per-month?"

On that metric, Anthropic might still hold the leaderboard while losing 80% of agent-loop volume to DeepClaude-style swaps. The market is asking last year's question.

Which Substrate Should You Pick?

You are...	Substrate
Indie / startup, cost-sensitive, feature work	DeepClaude
Enterprise with PRC-data-residency concerns	Claude Code on API
Browser/Computer Use is core	Codex Pro
Architecture-heavy, multi-file refactors	Claude Code on Opus 4.7
Burst usage (irregular peaks)	Codex Pro ($200 flat is great for spikes)
Steady high-volume production	DeepClaude (no rate-limit cliff)
Want optionality	DeepClaude default, fall back to Anthropic for hard tasks

The decoupling means it doesn't have to be one substrate. Run DeepClaude as default; route the hard 20% to Anthropic. The harness is constant; the brain is variable.

What Anthropic Does Next

Anthropic now has a product (Claude Code) that runs on a competitor's brain and bill, with Anthropic making nothing on inference. Three options:

Tighten the protocol — break the env-var swap. Politically expensive.
Reprice Sonnet by 5–10×. Margin-destructive.
Reframe the moat as the harness. Sub-agents, MCP, hooks, plugins — what DeepSeek can't clone.

We'd bet on option 3. The model becomes a swappable component, but the harness is where value accrues. "Best coding AI" becomes irrelevant; what matters is "best coding agent."

Bottom Line

The harness is the product, the model is a component, and the price discovery on inference has barely begun. Test DeepClaude this week. Cost of being wrong: one Saturday afternoon and $5. Cost of not testing: being on the wrong side of an industry repricing.

Originally published at AgentConn

DEV Community