Anthropic shipped Claude Opus 4.8 today, May 28, 2026. If you are building agentic systems, coding assistants, or any product that relies on an AI model to take sustained, multi-step actions in the real world, this release deserves your attention.
Opus 4.8 is not a full generational leap. Anthropic frames it as "a modest but tangible improvement" on its predecessor. But in agentic contexts, where reliability compounds across dozens of sequential steps, even incremental gains in judgment, honesty, and tool-calling precision translate into meaningfully better products.
Here is what shipped today and why it matters.
Opus 4.8 Benchmark Breakdown
Anthropic published benchmark results across five categories that matter most for professional and agentic workloads:
| Benchmark | Claude Opus 4.8 | Claude Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-Bench Pro (agentic coding) | 69.2% | 64.3% | 58.6% | 54.2% |
| Terminal-Bench 2.1 (terminal coding) | 74.6% | — | 78.2% | — |
| Humanity's Last Exam (reasoning, with tools) | 57.9% | — | — | — |
| OSWorld-Verified (computer use) | 83.4% | 82.3% | — | — |
| GDPval-AA (knowledge work) | 1890 | 1753 | — | — |
| Finance Agent v2 | 53.9% | — | — | — |

Source: Anthropic, May 28, 2026
A few important notes:
- Agentic coding (SWE-Bench Pro): Opus 4.8 leads all tested models at 69.2%.
- Terminal coding (Terminal-Bench 2.1): GPT-5.5 leads here at 78.2%. Opus 4.8 scores 74.6%.
- Computer use (OSWorld-Verified): 83.4% puts Opus 4.8 at the top of this category.
- Reasoning: 57.9% with tools. Best across all four tested models.
What's New Beyond the Benchmarks
Honesty as a Feature
Opus 4.8 is approximately 4x less likely than Opus 4.7 to let flaws in its own code pass without flagging them. Early testers describe the model as more likely to push back when a plan is not sound, and more likely to ask clarifying questions before making irreversible changes.
Effort Control
Opus 4.8 introduces effort levels: default (high), extra, and max. Higher effort means more thinking time and better results on difficult tasks. For long-running async workflows, Anthropic recommends the "extra" setting.
Fast Mode: 2.5x Speed, Now 3x Cheaper
Fast mode for Opus 4.8 runs at 2.5x the speed of regular mode. Fast mode is now priced at $10 per million input tokens and $50 per million output tokens, which is 3x cheaper than fast mode was for prior Opus models.
Dynamic Workflows in Claude Code
Shipping alongside Opus 4.8 is Dynamic Workflows, available in research preview for Max, Team, and Enterprise plan users.
Instead of a single agent working sequentially, Claude Code can now plan work upfront and spin up tens to hundreds of parallel subagents within a single session. Jarred Sumner, creator of Bun, used Dynamic Workflows to rewrite Bun from Zig to Rust: 750,000 lines of Rust, 99.8% of the test suite passing, shipped in 11 days.
How Cosmic Uses Opus 4.8
Cosmic now uses Claude Opus 4.8 as the recommended model for high-reasoning content and code operations. Cosmic's AI Agents across Content, Code, Computer Use, and Team types are all powered by Claude models.
Cosmic also ships an MCP Server that connects directly to Claude Code and Cursor. Opus 4.8's improved tool-calling efficiency makes this integration noticeably more responsive in practice.
If you are building on Claude Opus 4.8 and need a content layer that keeps up, start free on Cosmic or book a quick intro.
Sources: Introducing Claude Opus 4.8, Dynamic Workflows in Claude Code
Top comments (0)