DEV Community

Cover image for China's Coding AI Is Closing the Gap Fast
Max Quimby
Max Quimby

Posted on • Originally published at computeleap.com

China's Coding AI Is Closing the Gap Fast

📖 Read the full version with charts and embedded sources on ComputeLeap →

On June 1, MiniMax released M3 — an open-weights coding model that scores 59% on SWE-Bench Pro, edging out GPT-5.5 and Gemini 3.1 Pro. It supports a million-token context window, handles image and video input natively, and costs roughly 5-10% of what Western frontier models charge. It's open-weights. And it's out of China.

This isn't a benchmark novelty. It's the latest salvo in a pricing war that's collapsing the economics of AI-assisted coding — and the implications run deeper than any leaderboard position.

The 18-Day Wave That Changed the Math

Between April 7 and April 24, four Chinese AI labs shipped competing open-weight coding models in an 18-day stretch:

Model Lab SWE-Bench Pro Output Cost (per 1M tokens) vs. Opus 4.7
GLM-5.1 Z.ai 58.4% $3.50 14%
MiniMax M2.7 MiniMax 56.2% $1.20 5%
Kimi K2.6 Moonshot AI 58.6% $2.50 10%
DeepSeek V4-Flash DeepSeek 55.4% $0.28 1.1%

Claude Opus 4.7 charges $25 per million output tokens and leads at 64.3%. Every model in that April wave delivered competitive coding performance at a fraction of the cost — Kimi K2.6 tied with GPT-5.5 at 58.6%, at one-tenth the price.

What MiniMax M3 Actually Brings

M3 is the first open-weights model to combine three frontier capabilities in a single architecture: frontier-level coding, a million-token context window, and native multimodality.

  • SWE-Bench Pro: 59.0% (surpasses GPT-5.5, approaches Opus 4.7)
  • Terminal-Bench 2.1: 66.0%
  • MCP Atlas: 74.2%
  • BrowseComp: 83.5% (beats Opus 4.7's 79.3)

The context window runs on MiniMax Sparse Attention (MSA), reducing per-token compute to one-twentieth of the previous generation at million-token scale.

⚠️ All M3 benchmark scores are vendor-reported. Independent scores from Artificial Analysis and LMArena were still pending at launch. Opus 4.8 leads M3 by 10+ points on SWE-Bench Pro (69.2% vs 59.0%).

The Collision Course: Monetize vs. Commoditize

The Western stack is monetizing. GitHub Copilot wraps MAI-Code-1-Flash — Microsoft's first in-house coding model — inside a premium subscription. Anthropic charges $25/M output for Opus.

The Chinese stack is commoditizing. MiniMax, DeepSeek, Qwen, Moonshot, and Z.ai ship open-weights models on near-weekly cadence, each undercutting the last. Qwen3.7 Plus delivers multimodal agentic coding at $0.40 per million input tokens. DeepSeek V4-Flash hits $0.28 per million output — 1.1% of Opus pricing.

The Western premium stack needs the capability gap to justify its price. The Chinese open-weights stack needs to close that gap. Both are succeeding — which means the collision is getting closer.

The Benchmark Gap Is Real — But Shrinking

Claude Opus 4.8 scores 69.2% on SWE-Bench Pro — a full 10 points ahead of M3's 59.0%. On Terminal-Bench 2.1, the gap is 8.6 points.

But six months ago, the best Chinese open-weights coding model scored in the low 40s. Today, multiple Chinese models cluster between 55% and 60%. The gap contracted from 20+ points to roughly 10 in half a year.

For production deployments, "good enough at a tenth of the cost" often wins over "best at any price."

The Qwen Factor

Qwen3.7 Max hits 60.6% on SWE-Bench Pro — the strongest Chinese model on that benchmark. Qwen3-Coder-Next runs 80B MoE (3B active), scoring 70.6% on SWE-Bench Verified — competitive with models 10-20x larger.

Even Microsoft's HN community benchmarks MAI-Code-1-Flash against Qwen3.6-35B at 49.5%. The Chinese open-weight tier is now the baseline everyone measures against.

The Bottom Line

The West is building the most capable coding AI. China is building the most accessible. Both are right — for now.

But when the capability gap between a $25/M-token model and a $1.20/M-token model narrows from 20 points to 10, the economics start talking. MiniMax M3 isn't the model that closes the gap. It's the model that makes the gap's closure feel inevitable.

The coding moat hasn't fallen yet. But the water level is rising fast.

Originally published at ComputeLeap

Top comments (0)