DEV Community

Cover image for AI Dev Weekly #12: Opus 4.8 Drops, Anthropic Hits $965B, Chinese AI Goes 99% Cheaper, Microsoft Builds Its Own Coding Model
Joske Vermeulen
Joske Vermeulen

Posted on • Originally published at aimadetools.com

AI Dev Weekly #12: Opus 4.8 Drops, Anthropic Hits $965B, Chinese AI Goes 99% Cheaper, Microsoft Builds Its Own Coding Model

AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.

The theme this week is divergence. US labs are raising prices and valuations. Chinese labs are racing to zero. Developers are caught in the middle choosing between the absolute best (Opus 4.8 at $25/M output) and "good enough at 3% of the cost" (DeepSeek/MiMo at $0.87/M). Meanwhile Microsoft is hedging by building its own coding model to reduce OpenAI dependency. Let's break it all down.

1. Claude Opus 4.8: the new #1 coding model

Anthropic released Claude Opus 4.8 on May 28. Same price as 4.7 ($5/$25 per million tokens), better at everything.

Key numbers:

  • 69.2% SWE-bench Pro — up from 64.3% (4.7) and miles ahead of GPT-5.5 (58.6%)
  • 74.2% Terminal-Bench 2.1 — +8.4 points over 4.7
  • 88.6% SWE-bench Verified — highest of any model
  • 4× fewer unflagged code flaws — the honesty improvement is the real story
  • 61.4 Artificial Analysis Index — takes #1 from GPT-5.5

The biggest new feature is dynamic workflows in Claude Code. Claude can now plan a large task, spawn hundreds of parallel subagents, verify results, and iterate until convergence. Jarred Sumner used it to port Bun from Zig to Rust — 750,000 lines, 11 days, 99.8% test pass rate.

Other additions: effort control (low → max), fast mode at 3× cheaper ($10/$50 instead of $30/$150), and system messages mid-conversation in the API.

My take: I run Claude as one of seven agents in the $100 AI Startup Race. The 4.8 vs 4.7 improvement is immediately noticeable — fewer hallucinated progress claims, better self-correction, more efficient tool calling. The dynamic workflows feature is genuinely new territory. No other tool can spawn hundreds of coordinated agents from a single prompt. For codebase-scale migrations and audits, this is a step change. The question is whether the $25/M output price is justified when DeepSeek scores within 8 points for $0.87/M.

2. Anthropic raises $65B, surpasses OpenAI at $965B

Alongside the Opus 4.8 launch, Anthropic closed a $65 billion Series H at a $965 billion post-money valuation. That puts them above OpenAI for the first time.

The numbers:

  • $965B valuation (OpenAI was last valued at ~$900B)
  • $47B annualized revenue run rate — tripled in 3 months
  • Led by Altimeter Capital, Dragoneer, Greenoaks, Sequoia Capital
  • Mythos-class models (higher intelligence than Opus) coming "in weeks"

My take: The valuation flip is symbolic but the revenue growth is real. $47B run rate means Claude is generating serious enterprise revenue. The Mythos tease is interesting — they explicitly said it has "even higher intelligence than Opus" and is currently limited to cybersecurity work under Project Glasswing. If Mythos ships broadly in June, it could be another step change. For developers, the practical implication is: Anthropic has the resources to keep shipping fast. Expect monthly model updates to continue.

3. The Chinese AI pricing war goes nuclear

Two massive price cuts in one week made Chinese frontier models essentially free for cached workloads:

DeepSeek V4-Pro (May 22): The 75% promotional discount is now permanent. Output locked at $0.87/M tokens. Input at $0.435/M. Cache hits at $0.003625/M. This is a model that scores 80.6% on SWE-bench Verified — within 8 points of Opus 4.8.

MiMo V2.5 Pro (May 26): Xiaomi cut prices by up to 99%. Cached input dropped from $0.36/M to $0.0036/M. Standard pricing now matches DeepSeek exactly: $0.435/$0.87. Token Plans upgraded 5-51× (the $100 plan now gets 82 billion tokens).

The technical explanation: both labs achieved architectural breakthroughs in KV cache efficiency. DeepSeek's interleaved attention reduces cache to 10% of standard size. MiMo's hierarchical SWA uses a 1:7 sparsity ratio. Both claim break-even at these prices.

The result: Chinese AI models are now 30× cheaper than American equivalents on standard pricing, and 100×+ cheaper on cached workloads. For agent pipelines with stable system prompts, the effective cost is approaching zero.

My take: I tripled the Xiaomi agent's sessions in our race (from 2 to 6 per day) because the cost became negligible. At $0.0036/M cached tokens, running an autonomous agent 24/7 costs less than a cup of coffee per day. The quality gap is real but narrowing — DeepSeek V4-Pro at 80.6% SWE-bench vs Opus 4.8 at 88.6% is meaningful for hard tasks but irrelevant for 80% of routine coding. If you are spending more than $500/month on API calls and haven't tested Chinese models, you are leaving money on the table. We wrote a full migration guide if you want to try.

4. Microsoft building its own coding model for Build 2026

Reuters reported on May 28 that Microsoft will unveil a homegrown coding model at Build 2026 (June 2-3 in San Francisco). It is designed to boost GitHub Copilot and reduce dependency on OpenAI.

Also coming at Build:

  • Transcription model
  • Reasoning model
  • Speech model
  • Image model

This is part of a broader strategic shift. Microsoft is building a self-sufficient AI stack alongside its OpenAI partnership, not instead of it. The competitive pressure from Claude Code (which has been eating Copilot's market share) is the likely catalyst.

My take: This is the most significant signal yet that the OpenAI-Microsoft relationship is evolving. Microsoft investing billions in OpenAI while simultaneously building competing models tells you everything about where the industry is heading: no one wants to be dependent on a single provider. For developers using Copilot, this could mean better performance (a model optimized specifically for code completion rather than general-purpose) or it could mean fragmentation (yet another model to evaluate). Watch Build next week for details.

5. StepFun Step 3.7 Flash: 198B MoE at 400 tokens/sec

A new player entered the cheap-and-fast model tier. StepFun released Step 3.7 Flash — a 198B parameter MoE model that activates only 11B parameters per token.

The specs:

  • 400 tokens/second — 2× faster than Gemini 3.5 Flash
  • 256K context window
  • Native multimodal — text, images, video
  • 3 reasoning tiers — Low/Medium/High per API call
  • Advisor Mode — achieves 97% of Opus 4.6 coding at $0.19/task
  • Open-weight — self-hostable on 128GB RAM
  • ~$0.20/M input, ~$0.80/M output on OpenRouter

The unique feature is Advisor Mode: Step 3.7 Flash handles routine execution autonomously and only escalates to a stronger model when genuinely stuck. This automated routing achieves near-frontier quality at budget prices.

My take: The "Flash" model tier is getting crowded — Gemini 3.5 Flash, Step 3.7 Flash, DeepSeek V4 Flash. All under $1/M output, all fast enough for real-time use, all surprisingly capable. Step 3.7 Flash's video understanding and GUI interaction capabilities set it apart. The 400 t/s throughput is genuinely impressive for a model this capable. If you need speed + multimodal + cheap, this is worth testing.

6. Quick hits

  • OpenAI reportedly dropped GPT-5.3 Codex minutes after Anthropic's Opus 4.8 announcement. 25% faster than GPT-5.2, reportedly helped debug itself during development. Supercharges the Codex agentic coding tool launched earlier this month.
  • Cohere acquired Aleph Alpha — creating a $20B transatlantic sovereign AI company. Backed by Schwarz Group (Europe's largest retailer). Positions as the enterprise alternative to US/Chinese models for European companies with data sovereignty requirements.
  • Canada ruled OpenAI violated privacy laws — regulatory pressure continues to mount on US AI labs internationally.
  • Claude Code shipped detailed usage analytics — you can now see exactly how many tokens each session consumed, broken down by model and effort level.

What I'm watching next week

  • Microsoft Build (June 2-3) — The new coding model reveal. Will it compete with Claude Code or just improve Copilot's autocomplete?
  • Mythos timeline — Anthropic said "coming weeks." If it ships in early June, it could leapfrog Opus 4.8 immediately.
  • Gemini CLI shutdown (June 18) — Two weeks until the deadline. If you haven't migrated to Antigravity CLI, time is running out.
  • Our race agents — Claude is at 194 blog posts. Xiaomi just got tripled to 6 sessions/day. Gemini is back online after a 4-day auth outage. Follow along →

AI Dev Weekly publishes every Thursday. Subscribe for weekly race updates and AI developer news.

Originally published at https://www.aimadetools.com

Top comments (0)