Hassann

Posted on May 29 • Originally published at apidog.com

How to Use the Claude Opus 4.8 API?

The Claude Opus 4.8 API went live with the model launch on May 28, 2026. The model ID is claude-opus-4-8, and it uses the same Messages API as earlier Claude models. This guide shows how to get an API key, make your first request, configure the new effort parameter, enable adaptive thinking, stream responses, use tools, and test the integration in Apidog.

Try Apidog today

If you have already called a Claude model, the main migration step is changing the model string. The key new concept is effort control, which replaces the old thinking-budget pattern. If you are new to the Claude API, you can get a working Opus 4.8 request running in about ten minutes. For model background, see what is Claude Opus 4.8.

What you get with the Opus 4.8 API

The integration-relevant details:

claude-opus-4-8: 1M token input context, 128K token output
Same Messages endpoint: drop-in for projects already using Opus 4.7
effort control: five request-level settings from low to max
Adaptive thinking: the model decides how deeply to reason
Standard pricing: $5 per million input tokens, $25 per million output tokens

For cost examples and fast-mode rates, see the Opus 4.8 pricing guide. If you do not have a paid plan yet, the free access guide covers your options.

Step 1: Get your Claude API key

Go to console.anthropic.com
Sign in or create an account
Open Settings
Open API Keys
Click Create Key
Name the key and copy it

Store the key as an environment variable so it does not end up in your source code:

export ANTHROPIC_API_KEY="sk-ant-..."

New accounts get trial credits for testing before adding billing. The key works with claude-opus-4-8 immediately.

Step 2: Install the SDK

Anthropic provides official SDKs for Python, TypeScript, Go, Java, C#, Ruby, and PHP.

Install the SDK for your runtime:

# Python
pip install anthropic

# Node.js / TypeScript
npm install @anthropic-ai/sdk

You can also skip the SDK and call the REST endpoint directly with curl. If you need exact Python types, use the Python SDK source as the reference.

Step 3: Make your first Opus 4.8 call

Python

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs.",
        }
    ],
)

print(message.content[0].text)

Node.js

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs.",
    },
  ],
});

console.log(message.content[0].text);

curl

curl https://api.anthropic.com/v1/messages \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "content-type: application/json" \
  --data '{
    "model": "claude-opus-4-8",
    "max_tokens": 4096,
    "messages": [
      {
        "role": "user",
        "content": "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs."
      }
    ]
  }'

That is the baseline request. Add the features below as your integration needs them.

Configure effort control

The effort parameter controls how many tokens Opus 4.8 spends across the full response, including text, tool calls, and reasoning. It lives inside output_config.

Supported values:

low
medium
high
xhigh
max

The default is high, so omitting output_config.effort gives you high behavior.

Python

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=8192,
    messages=[
        {
            "role": "user",
            "content": "Refactor this 600-line module for testability.",
        }
    ],
    output_config={"effort": "xhigh"},
)

Node.js

const message = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 8192,
  messages: [
    {
      role: "user",
      content: "Refactor this 600-line module for testability.",
    },
  ],
  output_config: { effort: "xhigh" },
});

Use Anthropic’s effort docs as the baseline:

Level	Use it for
`low`	Classification, quick lookups, high-volume jobs, subagents
`medium`	Balanced agentic work where cost matters
`high`	Default. Complex reasoning where quality beats speed
`xhigh`	Coding and long-horizon agentic tasks; the recommended starting point
`max`	Genuinely frontier problems where you have measured headroom

Practical defaults:

Use xhigh for coding tasks and agentic loops.
When using xhigh or max, set a large max_tokens value. 64000 is a reasonable starting point so the model has room to reason and call tools.

Enable adaptive thinking

Opus 4.8 supports adaptive thinking. Set:

"thinking": {
  "type": "adaptive"
}

With adaptive thinking enabled, the model decides when and how much to reason. Without it, requests run with no thinking.

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[
        {
            "role": "user",
            "content": "Find the race condition in this scheduler.",
        }
    ],
)

for block in message.content:
    if block.type == "thinking":
        print("[thinking]", block.thinking[:200])
    elif block.type == "text":
        print(block.text)

Migration warning: manual extended thinking with budget_tokens is not supported on Opus 4.8 and returns a 400 error. If you used budget_tokens with Opus 4.5 or earlier, remove it and use adaptive thinking plus effort.

Stream responses

Streaming is useful for chat UIs, long generations, and agent status output.

Python

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": "Write a 5-step guide to writing a REST client in Go.",
        }
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Node.js

const stream = client.messages.stream({
  model: "claude-opus-4-8",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: "Write a 5-step guide to writing a REST client in Go.",
    },
  ],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}

For raw REST, add "stream": true to the request body and read the server-sent events.

Use tools and function calling

Opus 4.8 can call tools, and the effort level affects how it plans and groups calls.

Define tools with an input_schema:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name",
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                },
            },
            "required": ["city"],
        },
    }
]

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    tools=tools,
    messages=[
        {
            "role": "user",
            "content": "What's the weather in Singapore right now?",
        }
    ],
)

for block in message.content:
    if block.type == "tool_use":
        print(f"Call: {block.name}")
        print(f"Args: {block.input}")

The typical tool loop is:

Send the user request with the tool definitions.
Detect a tool_use block in the response.
Execute the tool in your application.
Append a tool_result block.
Call the Messages API again to continue.

Lower effort can make Claude batch operations into fewer calls. Higher effort can make it explain its plan first.

If you are building multi-agent systems, the managed agents vs Agent SDK guide covers architecture choices.

Use mid-conversation system messages

Opus 4.8 includes a Messages API change: you can place a system entry partway through the messages array, not only at the start.

This lets you inject new instructions or permissions during a task. That pattern is the foundation for Claude Code’s Dynamic Workflows.

If you orchestrate subagents through the API, read the Dynamic Workflows deep-dive.

Test your Opus 4.8 integration with Apidog

A successful SDK call is only the first step. Production integrations also need to handle:

streamed chunks
tool-call validation
the new output_config shape
adaptive-thinking blocks in responses
retries and error responses
response-shape drift when prompts or effort levels change

Apidog lets you test the Messages API in one workspace.

A practical setup:

Create a request for https://api.anthropic.com/v1/messages
Add the x-api-key header
Add the anthropic-version header
Paste one of the JSON request bodies from this guide
Send the request and inspect the response
Save variants for different effort levels and model versions

You can use Apidog to:

Save the endpoint as a request: paste https://api.anthropic.com/v1/messages, attach headers, and send.
Replay across model versions: change claude-opus-4-7 to claude-opus-4-8 and compare outputs.
Stream responses inline: inspect streamed chunks as they arrive, including per-chunk timings.
Validate response shape: add assertions for output_config, thinking blocks, and tool calls.
Mock the endpoint: generate a mock Messages response to test downstream code without spending credits.
Build agent-loop scenarios: chain requests with tool-call validation between steps.

To start, download Apidog, create a request for the Messages endpoint, and import the earlier curl snippet. The same testing flow works for the Gemini 3.5 API and Qwen 3.7 API if you use multiple providers.

Handle errors and rate limits

Claude’s API uses a consistent error model. These are the common cases to handle:

Status	Error	What to check
`400`	`invalid_request_error`	Malformed body, unsupported `budget_tokens`, or invalid `effort` value
`401`	`authentication_error`	Missing or invalid API key
`403`	`permission_error`	Key cannot access the model
`429`	`rate_limit_error`	Back off and retry
`500`	`api_error`	Server-side error; retry with backoff
`529`	`overloaded_error`	API temporarily overloaded; retry with backoff

A minimal Python retry wrapper:

import time
import anthropic

client = anthropic.Anthropic()

def call_with_retry(prompt, max_retries=4):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-opus-4-8",
                max_tokens=4096,
                messages=[
                    {
                        "role": "user",
                        "content": prompt,
                    }
                ],
            )
        except anthropic.RateLimitError:
            if attempt == max_retries - 1:
                raise

            time.sleep(2 ** attempt)

Rate limits depend on your usage tier. For high-throughput batch workloads that do not need real-time latency, the Batch API also supports up to 300K output tokens with a beta header.

Migrate from Opus 4.7 to 4.8

For most projects, migration starts with one string change:

# Before
model="claude-opus-4-7"

# After
model="claude-opus-4-8"

After the swap, verify:

Effort levels: rerun evals at the effort level your app uses.
Thinking config: remove budget_tokens if you used it; Opus 4.8 rejects it with 400.
Tool schemas: existing schemas carry forward, but rerun tool-use evals.
Cost: per-token rates match Opus 4.7, so there should be no billing surprise from the model swap alone.

FAQ

What is the Claude Opus 4.8 API model ID?

Use claude-opus-4-8 on the Claude API and Vertex AI. Use anthropic.claude-opus-4-8 on AWS Bedrock.

Is there a free tier for the Opus 4.8 API?

There is no standing free API tier, but new accounts get trial credits. See the free access guide for other low-cost paths.

How do I set the effort level?

Pass output_config: {"effort": "xhigh"} in the request. You can also use low, medium, high, or max. The default is high.

Why does my request return a 400 about budget_tokens?

Opus 4.8 does not support manual extended thinking. Remove budget_tokens and use thinking: {type: "adaptive"} with the effort parameter.

Does Opus 4.8 work with the OpenAI-compatible SDK?

Anthropic provides a compatibility layer for the OpenAI SDK. Point the base URL at the Anthropic endpoint, use your Anthropic key, and keep the model string as claude-opus-4-8.

What max_tokens should I set for agentic work?

Start at 64000 when running xhigh or max effort so the model has room to think and chain tool calls. Tune down after you measure real usage.

How do I test streaming responses in Apidog?

Open the request, enable streaming in the body, and Apidog renders the server-sent event chunks as they arrive. This makes incomplete responses easier to spot.

DEV Community

How to Use the Claude Opus 4.8 API?

What you get with the Opus 4.8 API

Step 1: Get your Claude API key

Step 2: Install the SDK

Step 3: Make your first Opus 4.8 call

Python

Node.js

curl

Configure effort control

Python

Node.js

Enable adaptive thinking

Stream responses

Python

Node.js

Use tools and function calling

Use mid-conversation system messages

Test your Opus 4.8 integration with Apidog

Handle errors and rate limits

Migrate from Opus 4.7 to 4.8

FAQ

Top comments (0)