Hassann

Posted on May 29 • Originally published at apidog.com

What Is Claude Opus 4.8? Everything Developers Need to Know

Anthropic shipped Claude Opus 4.8 on May 28, 2026, with same-day availability across the Claude API, Claude apps, Claude Code, and major cloud platforms. The API model ID is claude-opus-4-8. If you are already using Opus 4.7, migration is usually a one-line model change, but you should still validate streaming, tool calls, and response parsing before shipping.

Try Apidog today

According to Anthropic’s announcement, Opus 4.8 keeps the same pricing and context limits as Opus 4.7, but improves quality. Anthropic says it is about four times less likely than 4.7 to let a code flaw pass unremarked, and it is more explicit about uncertainty. This guide focuses on what changed, how to call the model, and how to test the migration safely.

The short version

Use these values when updating an existing integration:

Model ID: claude-opus-4-8
Availability: Claude API, AWS, Vertex AI, Microsoft Foundry, Claude apps, and Claude Code
Standard pricing: $5 per million input tokens, $25 per million output tokens
Context window: 1M input tokens
Max output: 128K tokens
Default effort: high

What changes in practice:

Better code review and fewer silent generated-code defects
More direct uncertainty handling
More efficient tool calling in agent loops
Newer effort behavior for controlling token spend across the full response
Adaptive thinking instead of manual budget_tokens
Dynamic Workflows in Claude Code for large parallel agent tasks

For cost modeling, see the Opus 4.8 pricing breakdown. For implementation details, start with the Opus 4.8 API guide.

What changed in Opus 4.8

Opus 4.8 keeps the core specs of Opus 4.7 and improves behavior underneath.

1. Code quality

Anthropic reports roughly a 4x reduction in code flaws that pass review unremarked compared with Opus 4.7.

That matters most when Claude is producing diffs, refactors, migrations, tests, or multi-file changes. For agentic coding workflows, the biggest failure mode is not a visible error. It is a plausible-looking patch with a subtle bug.

2. Honesty and alignment

Opus 4.8 is designed to flag uncertainty more often and make fewer unsupported claims. Anthropic also reports lower rates of deception and misuse cooperation compared with 4.7.

If you run autonomous or semi-autonomous agents, this is more important than a simple chat benchmark. Agents need to know when to stop, ask for context, or avoid overconfident action.

3. Tool calling

Opus 4.8 is more efficient at selecting and using tools. In agent loops, this can reduce:

unnecessary tool invocations
latency from avoidable calls
token usage from bloated tool arguments
retry loops caused by poor tool selection

4. Effort control

The most important API-visible control is output_config.effort.

Effort control: choose the right gear

The effort parameter controls how eagerly Claude spends tokens across the whole response.

It is set inside output_config:

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "messages": [
    {
      "role": "user",
      "content": "Refactor this module and explain the main changes."
    }
  ],
  "output_config": {
    "effort": "xhigh"
  }
}

Supported values:

low
medium
high
xhigh
max

The default is high.

Important detail: effort affects all output tokens, not only hidden reasoning. That includes:

final text
tool calls
function arguments
reasoning behavior

Use it like this:

Workload	Suggested starting effort
Simple summarization	`low` or `medium`
Normal technical Q&A	`high`
Code review	`xhigh`
Agentic coding	`xhigh`
Complex multi-step planning	`xhigh` or `max`
Cost-sensitive batch work	Start at `medium`, then evaluate

Anthropic’s guidance is to start at xhigh for coding and agentic tasks, keep high as the floor for reasoning-heavy work, and only step down after evals confirm quality. See Anthropic’s effort docs for the full behavior.

Adaptive thinking replaces manual thinking budgets

Opus 4.8 uses adaptive thinking.

Instead of manually setting a thinking token budget, you use:

{
  "thinking": {
    "type": "adaptive"
  }
}

The model then decides how much reasoning the request needs.

At high, xhigh, and max effort, it usually reasons more deeply. At lower effort levels, it can skip deeper thinking for simple tasks.

A basic request shape looks like this:

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "thinking": {
    "type": "adaptive"
  },
  "output_config": {
    "effort": "xhigh"
  },
  "messages": [
    {
      "role": "user",
      "content": "Review this pull request for correctness and edge cases."
    }
  ]
}

Migration note: manual extended thinking with budget_tokens is not supported on Opus 4.8 and returns a 400 error. If your current integration uses that pattern, replace it with adaptive thinking plus output_config.effort.

The exact request structure is covered in the Opus 4.8 API guide.

Dynamic Workflows in Claude Code

Dynamic Workflows are available in Claude Code. They let one session launch hundreds of parallel subagents for large, branching tasks.

Under the hood, this combines:

xhigh effort
mid-conversation system messages
orchestration logic that can spawn worker agents as the task evolves

The Messages API update matters because system entries can now appear partway through a conversation instead of only at the start. That gives an orchestrator agent a way to adjust instructions while work is already in progress.

For the mechanics and API-level orchestration pattern, see the Claude Code Dynamic Workflows deep-dive. For background on Claude Code agent execution, read the Claude Code agent harness breakdown.

Benchmark highlights

Anthropic’s published results focus on agentic workloads:

Beats GPT-5.5 on the Super-Agent benchmark, which measures end-to-end task completion
Tops the Legal Agent Benchmark and is the first model to break 10% overall on it
84% on Online-Mind2Web, a web-navigation agent benchmark

These are agent scores, not general chat scores. That is the intended positioning: Opus 4.8 is for difficult coding, tool use, and autonomous work.

For a broader comparison, read Opus 4.8 vs GPT-5.5 vs Gemini 3.5. The older Gemini 3.5 vs GPT-5.5 vs Opus 4.7 comparison is still useful as a 4.7 baseline.

Opus 4.8 vs Opus 4.7

Attribute	Opus 4.7	Opus 4.8
API ID	`claude-opus-4-7`	`claude-opus-4-8`
Input price	$5 / 1M tokens	$5 / 1M tokens
Output price	$25 / 1M tokens	$25 / 1M tokens
Context window	1M tokens	1M tokens
Max output	128K tokens	128K tokens
Effort levels	low to max	low to max
Code defects passed	baseline	~4x fewer
Honesty / alignment	baseline	improved
Knowledge cutoff	Jan 2026	Jan 2026

The specs are intentionally similar. The migration value is quality at the same standard token price. Still, validate your prompts, tool schemas, and parsers before changing production traffic.

How to access Claude Opus 4.8

You can access Opus 4.8 through four main surfaces.

1. Claude API

Use claude-opus-4-8 with the Messages endpoint.

Example request body:

{
  "model": "claude-opus-4-8",
  "max_tokens": 4096,
  "thinking": {
    "type": "adaptive"
  },
  "output_config": {
    "effort": "xhigh"
  },
  "messages": [
    {
      "role": "user",
      "content": "Analyze this codebase change and identify correctness risks."
    }
  ]
}

Start with the Opus 4.8 API guide.

2. Claude apps

Opus 4.8 is available at claude.ai for paid plans, with limited access on the free plan.

3. Claude Code

Claude Code supports Opus 4.8 as the top model. Dynamic Workflows are available when you opt into the high-effort mode.

4. Cloud platforms

Opus 4.8 is available through:

AWS Bedrock: anthropic.claude-opus-4-8
Vertex AI: claude-opus-4-8
Microsoft Foundry, where the context window is capped at 200K tokens

If you want to test it before committing to API usage, see the how to use Opus 4.8 for free guide.

Who should use Opus 4.8

Use Opus 4.8 when quality matters more than raw cost or latency.

Good fits:

long agentic coding sessions
code review where silent bugs are expensive
multi-step tool workflows
autonomous agents that need judgment
large refactors or migration planning
legal, research, or web-navigation agents
tasks that require frontier-level reasoning

Use a smaller model or a lower effort level for:

simple classification
short extraction tasks
high-volume routing
latency-sensitive UI completions
low-risk summarization

The practical workflow is:

Start with claude-opus-4-8 and effort: "xhigh" for difficult coding or agent tasks.
Run evals on real prompts.
Try high, then medium, if quality holds.
Keep the lowest effort level that passes your evals.

Test Opus 4.8 before production

A model ID swap is easy:

- "model": "claude-opus-4-7"
+ "model": "claude-opus-4-8"

But production behavior can still change.

Before rollout, test:

response shape
streamed chunks
tool-call arguments
schema validation
retry behavior
max token handling
adaptive-thinking responses
output differences across effort levels

A safe migration checklist:

Copy your existing Opus 4.7 request.
Change only the model ID to claude-opus-4-8.
Run the same prompt set against both models.
Diff outputs and tool calls.
Add output_config.effort.
Re-run evals at high, xhigh, and any lower effort level you plan to use.
Validate streaming and downstream parsers.
Roll out gradually.

Apidog can help you test the Messages API in one workspace:

Save the Opus 4.8 endpoint as a request
Attach your x-api-key
Send the same request to Opus 4.7 and Opus 4.8
Diff responses
Inspect streamed chunks with timings
Add assertions for schema drift
Mock the endpoint to test downstream code without spending credits

Download Apidog, point a request at the Messages endpoint, and paste in the curl snippet from the API guide.

FAQ

Is Claude Opus 4.8 better than Opus 4.7?

Yes, for quality. Anthropic reports roughly 4x fewer code defects passing unremarked, better uncertainty handling, and more efficient tool calling. Pricing, context size, and max output are unchanged.

How much does Opus 4.8 cost?

In standard mode, Opus 4.8 costs $5 per million input tokens and $25 per million output tokens. Fast mode runs at $10 per million input tokens and $50 per million output tokens for 2.5x faster output. See the pricing breakdown.

What is the context window for Opus 4.8?

Opus 4.8 supports 1M input tokens and up to 128K output tokens on the synchronous Messages API. The Batch API supports up to 300K output tokens with a beta header. On Microsoft Foundry, the context window is 200K tokens.

Does Opus 4.8 support extended thinking?

It supports adaptive thinking with:

{
  "thinking": {
    "type": "adaptive"
  }
}

Manual budget_tokens thinking is not supported and returns a 400 error.

What is the effort parameter?

effort is a setting inside output_config that controls how many tokens Claude spends across text, tool calls, and reasoning.

Supported levels are:

low, medium, high, xhigh, max