Anthropic shipped Claude Opus 4.8 on May 28, 2026, with same-day availability across the Claude API, Claude apps, Claude Code, and major cloud platforms. The API model ID is claude-opus-4-8. If you are already using Opus 4.7, migration is usually a one-line model change, but you should still validate streaming, tool calls, and response parsing before shipping.
According to Anthropic’s announcement, Opus 4.8 keeps the same pricing and context limits as Opus 4.7, but improves quality. Anthropic says it is about four times less likely than 4.7 to let a code flaw pass unremarked, and it is more explicit about uncertainty. This guide focuses on what changed, how to call the model, and how to test the migration safely.
The short version
Use these values when updating an existing integration:
-
Model ID:
claude-opus-4-8 - Availability: Claude API, AWS, Vertex AI, Microsoft Foundry, Claude apps, and Claude Code
- Standard pricing: $5 per million input tokens, $25 per million output tokens
- Context window: 1M input tokens
- Max output: 128K tokens
-
Default effort:
high
What changes in practice:
- Better code review and fewer silent generated-code defects
- More direct uncertainty handling
- More efficient tool calling in agent loops
- Newer
effortbehavior for controlling token spend across the full response - Adaptive thinking instead of manual
budget_tokens - Dynamic Workflows in Claude Code for large parallel agent tasks
For cost modeling, see the Opus 4.8 pricing breakdown. For implementation details, start with the Opus 4.8 API guide.
What changed in Opus 4.8
Opus 4.8 keeps the core specs of Opus 4.7 and improves behavior underneath.
1. Code quality
Anthropic reports roughly a 4x reduction in code flaws that pass review unremarked compared with Opus 4.7.
That matters most when Claude is producing diffs, refactors, migrations, tests, or multi-file changes. For agentic coding workflows, the biggest failure mode is not a visible error. It is a plausible-looking patch with a subtle bug.
2. Honesty and alignment
Opus 4.8 is designed to flag uncertainty more often and make fewer unsupported claims. Anthropic also reports lower rates of deception and misuse cooperation compared with 4.7.
If you run autonomous or semi-autonomous agents, this is more important than a simple chat benchmark. Agents need to know when to stop, ask for context, or avoid overconfident action.
3. Tool calling
Opus 4.8 is more efficient at selecting and using tools. In agent loops, this can reduce:
- unnecessary tool invocations
- latency from avoidable calls
- token usage from bloated tool arguments
- retry loops caused by poor tool selection
4. Effort control
The most important API-visible control is output_config.effort.
Effort control: choose the right gear
The effort parameter controls how eagerly Claude spends tokens across the whole response.
It is set inside output_config:
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": "Refactor this module and explain the main changes."
}
],
"output_config": {
"effort": "xhigh"
}
}
Supported values:
lowmediumhighxhighmax
The default is high.
Important detail: effort affects all output tokens, not only hidden reasoning. That includes:
- final text
- tool calls
- function arguments
- reasoning behavior
Use it like this:
| Workload | Suggested starting effort |
|---|---|
| Simple summarization |
low or medium
|
| Normal technical Q&A | high |
| Code review | xhigh |
| Agentic coding | xhigh |
| Complex multi-step planning |
xhigh or max
|
| Cost-sensitive batch work | Start at medium, then evaluate |
Anthropic’s guidance is to start at xhigh for coding and agentic tasks, keep high as the floor for reasoning-heavy work, and only step down after evals confirm quality. See Anthropic’s effort docs for the full behavior.
Adaptive thinking replaces manual thinking budgets
Opus 4.8 uses adaptive thinking.
Instead of manually setting a thinking token budget, you use:
{
"thinking": {
"type": "adaptive"
}
}
The model then decides how much reasoning the request needs.
At high, xhigh, and max effort, it usually reasons more deeply. At lower effort levels, it can skip deeper thinking for simple tasks.
A basic request shape looks like this:
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"thinking": {
"type": "adaptive"
},
"output_config": {
"effort": "xhigh"
},
"messages": [
{
"role": "user",
"content": "Review this pull request for correctness and edge cases."
}
]
}
Migration note: manual extended thinking with budget_tokens is not supported on Opus 4.8 and returns a 400 error. If your current integration uses that pattern, replace it with adaptive thinking plus output_config.effort.
The exact request structure is covered in the Opus 4.8 API guide.
Dynamic Workflows in Claude Code
Dynamic Workflows are available in Claude Code. They let one session launch hundreds of parallel subagents for large, branching tasks.
Under the hood, this combines:
-
xhigheffort - mid-conversation system messages
- orchestration logic that can spawn worker agents as the task evolves
The Messages API update matters because system entries can now appear partway through a conversation instead of only at the start. That gives an orchestrator agent a way to adjust instructions while work is already in progress.
For the mechanics and API-level orchestration pattern, see the Claude Code Dynamic Workflows deep-dive. For background on Claude Code agent execution, read the Claude Code agent harness breakdown.
Benchmark highlights
Anthropic’s published results focus on agentic workloads:
- Beats GPT-5.5 on the Super-Agent benchmark, which measures end-to-end task completion
- Tops the Legal Agent Benchmark and is the first model to break 10% overall on it
- 84% on Online-Mind2Web, a web-navigation agent benchmark
These are agent scores, not general chat scores. That is the intended positioning: Opus 4.8 is for difficult coding, tool use, and autonomous work.
For a broader comparison, read Opus 4.8 vs GPT-5.5 vs Gemini 3.5. The older Gemini 3.5 vs GPT-5.5 vs Opus 4.7 comparison is still useful as a 4.7 baseline.
Opus 4.8 vs Opus 4.7
| Attribute | Opus 4.7 | Opus 4.8 |
|---|---|---|
| API ID | claude-opus-4-7 |
claude-opus-4-8 |
| Input price | $5 / 1M tokens | $5 / 1M tokens |
| Output price | $25 / 1M tokens | $25 / 1M tokens |
| Context window | 1M tokens | 1M tokens |
| Max output | 128K tokens | 128K tokens |
| Effort levels | low to max | low to max |
| Code defects passed | baseline | ~4x fewer |
| Honesty / alignment | baseline | improved |
| Knowledge cutoff | Jan 2026 | Jan 2026 |
The specs are intentionally similar. The migration value is quality at the same standard token price. Still, validate your prompts, tool schemas, and parsers before changing production traffic.
How to access Claude Opus 4.8
You can access Opus 4.8 through four main surfaces.
1. Claude API
Use claude-opus-4-8 with the Messages endpoint.
Example request body:
{
"model": "claude-opus-4-8",
"max_tokens": 4096,
"thinking": {
"type": "adaptive"
},
"output_config": {
"effort": "xhigh"
},
"messages": [
{
"role": "user",
"content": "Analyze this codebase change and identify correctness risks."
}
]
}
Start with the Opus 4.8 API guide.
2. Claude apps
Opus 4.8 is available at claude.ai for paid plans, with limited access on the free plan.
3. Claude Code
Claude Code supports Opus 4.8 as the top model. Dynamic Workflows are available when you opt into the high-effort mode.
4. Cloud platforms
Opus 4.8 is available through:
- AWS Bedrock:
anthropic.claude-opus-4-8 - Vertex AI:
claude-opus-4-8 - Microsoft Foundry, where the context window is capped at 200K tokens
If you want to test it before committing to API usage, see the how to use Opus 4.8 for free guide.
Who should use Opus 4.8
Use Opus 4.8 when quality matters more than raw cost or latency.
Good fits:
- long agentic coding sessions
- code review where silent bugs are expensive
- multi-step tool workflows
- autonomous agents that need judgment
- large refactors or migration planning
- legal, research, or web-navigation agents
- tasks that require frontier-level reasoning
Use a smaller model or a lower effort level for:
- simple classification
- short extraction tasks
- high-volume routing
- latency-sensitive UI completions
- low-risk summarization
The practical workflow is:
- Start with
claude-opus-4-8andeffort: "xhigh"for difficult coding or agent tasks. - Run evals on real prompts.
- Try
high, thenmedium, if quality holds. - Keep the lowest effort level that passes your evals.
Test Opus 4.8 before production
A model ID swap is easy:
- "model": "claude-opus-4-7"
+ "model": "claude-opus-4-8"
But production behavior can still change.
Before rollout, test:
- response shape
- streamed chunks
- tool-call arguments
- schema validation
- retry behavior
- max token handling
- adaptive-thinking responses
- output differences across effort levels
A safe migration checklist:
- Copy your existing Opus 4.7 request.
- Change only the model ID to
claude-opus-4-8. - Run the same prompt set against both models.
- Diff outputs and tool calls.
- Add
output_config.effort. - Re-run evals at
high,xhigh, and any lower effort level you plan to use. - Validate streaming and downstream parsers.
- Roll out gradually.
Apidog can help you test the Messages API in one workspace:
- Save the Opus 4.8 endpoint as a request
- Attach your
x-api-key - Send the same request to Opus 4.7 and Opus 4.8
- Diff responses
- Inspect streamed chunks with timings
- Add assertions for schema drift
- Mock the endpoint to test downstream code without spending credits
Download Apidog, point a request at the Messages endpoint, and paste in the curl snippet from the API guide.
FAQ
Is Claude Opus 4.8 better than Opus 4.7?
Yes, for quality. Anthropic reports roughly 4x fewer code defects passing unremarked, better uncertainty handling, and more efficient tool calling. Pricing, context size, and max output are unchanged.
How much does Opus 4.8 cost?
In standard mode, Opus 4.8 costs $5 per million input tokens and $25 per million output tokens. Fast mode runs at $10 per million input tokens and $50 per million output tokens for 2.5x faster output. See the pricing breakdown.
What is the context window for Opus 4.8?
Opus 4.8 supports 1M input tokens and up to 128K output tokens on the synchronous Messages API. The Batch API supports up to 300K output tokens with a beta header. On Microsoft Foundry, the context window is 200K tokens.
Does Opus 4.8 support extended thinking?
It supports adaptive thinking with:
{
"thinking": {
"type": "adaptive"
}
}
Manual budget_tokens thinking is not supported and returns a 400 error.
What is the effort parameter?
effort is a setting inside output_config that controls how many tokens Claude spends across text, tool calls, and reasoning.
Supported levels are:
low, medium, high, xhigh, max
The default is high.
Can I use Opus 4.8 for free?
There is no free API tier, but you can try Opus 4.8 on the free plan at claude.ai with limits, or through trial credits. See the free access guide.
What are Dynamic Workflows?
Dynamic Workflows are a Claude Code feature that launches many parallel subagents in one session. They are powered by xhigh effort and mid-conversation system messages. Details are in the Dynamic Workflows guide.


Top comments (0)