DEV Community

Cover image for How to Use the Claude Opus 4.8 API?
Hassann
Hassann

Posted on • Originally published at apidog.com

How to Use the Claude Opus 4.8 API?

The Claude Opus 4.8 API went live with the model launch on May 28, 2026. The model ID is claude-opus-4-8, and it uses the same Messages API as earlier Claude models. This guide shows how to get an API key, make your first request, configure the new effort parameter, enable adaptive thinking, stream responses, use tools, and test the integration in Apidog.

Try Apidog today

If you have already called a Claude model, the main migration step is changing the model string. The key new concept is effort control, which replaces the old thinking-budget pattern. If you are new to the Claude API, you can get a working Opus 4.8 request running in about ten minutes. For model background, see what is Claude Opus 4.8.

What you get with the Opus 4.8 API

The integration-relevant details:

  • claude-opus-4-8: 1M token input context, 128K token output
  • Same Messages endpoint: drop-in for projects already using Opus 4.7
  • effort control: five request-level settings from low to max
  • Adaptive thinking: the model decides how deeply to reason
  • Standard pricing: $5 per million input tokens, $25 per million output tokens

For cost examples and fast-mode rates, see the Opus 4.8 pricing guide. If you do not have a paid plan yet, the free access guide covers your options.

Step 1: Get your Claude API key

  1. Go to console.anthropic.com
  2. Sign in or create an account
  3. Open Settings
  4. Open API Keys
  5. Click Create Key
  6. Name the key and copy it

Store the key as an environment variable so it does not end up in your source code:

export ANTHROPIC_API_KEY="sk-ant-..."
Enter fullscreen mode Exit fullscreen mode

New accounts get trial credits for testing before adding billing. The key works with claude-opus-4-8 immediately.

Step 2: Install the SDK

Anthropic provides official SDKs for Python, TypeScript, Go, Java, C#, Ruby, and PHP.

Install the SDK for your runtime:

# Python
pip install anthropic

# Node.js / TypeScript
npm install @anthropic-ai/sdk
Enter fullscreen mode Exit fullscreen mode

You can also skip the SDK and call the REST endpoint directly with curl. If you need exact Python types, use the Python SDK source as the reference.

Step 3: Make your first Opus 4.8 call

Python

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs.",
        }
    ],
)

print(message.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Node.js

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs.",
    },
  ],
});

console.log(message.content[0].text);
Enter fullscreen mode Exit fullscreen mode

curl

curl https://api.anthropic.com/v1/messages \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "content-type: application/json" \
  --data '{
    "model": "claude-opus-4-8",
    "max_tokens": 4096,
    "messages": [
      {
        "role": "user",
        "content": "Explain the OAuth 2.0 PKCE flow in 3 short paragraphs."
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

That is the baseline request. Add the features below as your integration needs them.

Configure effort control

The effort parameter controls how many tokens Opus 4.8 spends across the full response, including text, tool calls, and reasoning. It lives inside output_config.

Supported values:

  • low
  • medium
  • high
  • xhigh
  • max

The default is high, so omitting output_config.effort gives you high behavior.

Python

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=8192,
    messages=[
        {
            "role": "user",
            "content": "Refactor this 600-line module for testability.",
        }
    ],
    output_config={"effort": "xhigh"},
)
Enter fullscreen mode Exit fullscreen mode

Node.js

const message = await client.messages.create({
  model: "claude-opus-4-8",
  max_tokens: 8192,
  messages: [
    {
      role: "user",
      content: "Refactor this 600-line module for testability.",
    },
  ],
  output_config: { effort: "xhigh" },
});
Enter fullscreen mode Exit fullscreen mode

Use Anthropic’s effort docs as the baseline:

Level Use it for
low Classification, quick lookups, high-volume jobs, subagents
medium Balanced agentic work where cost matters
high Default. Complex reasoning where quality beats speed
xhigh Coding and long-horizon agentic tasks; the recommended starting point
max Genuinely frontier problems where you have measured headroom

Practical defaults:

  • Use xhigh for coding tasks and agentic loops.
  • When using xhigh or max, set a large max_tokens value. 64000 is a reasonable starting point so the model has room to reason and call tools.

Enable adaptive thinking

Opus 4.8 supports adaptive thinking. Set:

"thinking": {
  "type": "adaptive"
}
Enter fullscreen mode Exit fullscreen mode

With adaptive thinking enabled, the model decides when and how much to reason. Without it, requests run with no thinking.

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[
        {
            "role": "user",
            "content": "Find the race condition in this scheduler.",
        }
    ],
)

for block in message.content:
    if block.type == "thinking":
        print("[thinking]", block.thinking[:200])
    elif block.type == "text":
        print(block.text)
Enter fullscreen mode Exit fullscreen mode

Migration warning: manual extended thinking with budget_tokens is not supported on Opus 4.8 and returns a 400 error. If you used budget_tokens with Opus 4.5 or earlier, remove it and use adaptive thinking plus effort.

Stream responses

Streaming is useful for chat UIs, long generations, and agent status output.

Python

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": "Write a 5-step guide to writing a REST client in Go.",
        }
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

Node.js

const stream = client.messages.stream({
  model: "claude-opus-4-8",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: "Write a 5-step guide to writing a REST client in Go.",
    },
  ],
});

for await (const event of stream) {
  if (
    event.type === "content_block_delta" &&
    event.delta.type === "text_delta"
  ) {
    process.stdout.write(event.delta.text);
  }
}
Enter fullscreen mode Exit fullscreen mode

For raw REST, add "stream": true to the request body and read the server-sent events.

Use tools and function calling

Opus 4.8 can call tools, and the effort level affects how it plans and groups calls.

Define tools with an input_schema:

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name",
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                },
            },
            "required": ["city"],
        },
    }
]

message = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    tools=tools,
    messages=[
        {
            "role": "user",
            "content": "What's the weather in Singapore right now?",
        }
    ],
)

for block in message.content:
    if block.type == "tool_use":
        print(f"Call: {block.name}")
        print(f"Args: {block.input}")
Enter fullscreen mode Exit fullscreen mode

The typical tool loop is:

  1. Send the user request with the tool definitions.
  2. Detect a tool_use block in the response.
  3. Execute the tool in your application.
  4. Append a tool_result block.
  5. Call the Messages API again to continue.

Lower effort can make Claude batch operations into fewer calls. Higher effort can make it explain its plan first.

If you are building multi-agent systems, the managed agents vs Agent SDK guide covers architecture choices.

Use mid-conversation system messages

Opus 4.8 includes a Messages API change: you can place a system entry partway through the messages array, not only at the start.

This lets you inject new instructions or permissions during a task. That pattern is the foundation for Claude Code’s Dynamic Workflows.

If you orchestrate subagents through the API, read the Dynamic Workflows deep-dive.

Test your Opus 4.8 integration with Apidog

A successful SDK call is only the first step. Production integrations also need to handle:

  • streamed chunks
  • tool-call validation
  • the new output_config shape
  • adaptive-thinking blocks in responses
  • retries and error responses
  • response-shape drift when prompts or effort levels change

Apidog lets you test the Messages API in one workspace.

A practical setup:

  1. Create a request for https://api.anthropic.com/v1/messages
  2. Add the x-api-key header
  3. Add the anthropic-version header
  4. Paste one of the JSON request bodies from this guide
  5. Send the request and inspect the response
  6. Save variants for different effort levels and model versions

You can use Apidog to:

  • Save the endpoint as a request: paste https://api.anthropic.com/v1/messages, attach headers, and send.
  • Replay across model versions: change claude-opus-4-7 to claude-opus-4-8 and compare outputs.
  • Stream responses inline: inspect streamed chunks as they arrive, including per-chunk timings.
  • Validate response shape: add assertions for output_config, thinking blocks, and tool calls.
  • Mock the endpoint: generate a mock Messages response to test downstream code without spending credits.
  • Build agent-loop scenarios: chain requests with tool-call validation between steps.

To start, download Apidog, create a request for the Messages endpoint, and import the earlier curl snippet. The same testing flow works for the Gemini 3.5 API and Qwen 3.7 API if you use multiple providers.

Handle errors and rate limits

Claude’s API uses a consistent error model. These are the common cases to handle:

Status Error What to check
400 invalid_request_error Malformed body, unsupported budget_tokens, or invalid effort value
401 authentication_error Missing or invalid API key
403 permission_error Key cannot access the model
429 rate_limit_error Back off and retry
500 api_error Server-side error; retry with backoff
529 overloaded_error API temporarily overloaded; retry with backoff

A minimal Python retry wrapper:

import time
import anthropic

client = anthropic.Anthropic()

def call_with_retry(prompt, max_retries=4):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-opus-4-8",
                max_tokens=4096,
                messages=[
                    {
                        "role": "user",
                        "content": prompt,
                    }
                ],
            )
        except anthropic.RateLimitError:
            if attempt == max_retries - 1:
                raise

            time.sleep(2 ** attempt)
Enter fullscreen mode Exit fullscreen mode

Rate limits depend on your usage tier. For high-throughput batch workloads that do not need real-time latency, the Batch API also supports up to 300K output tokens with a beta header.

Migrate from Opus 4.7 to 4.8

For most projects, migration starts with one string change:

# Before
model="claude-opus-4-7"

# After
model="claude-opus-4-8"
Enter fullscreen mode Exit fullscreen mode

After the swap, verify:

  1. Effort levels: rerun evals at the effort level your app uses.
  2. Thinking config: remove budget_tokens if you used it; Opus 4.8 rejects it with 400.
  3. Tool schemas: existing schemas carry forward, but rerun tool-use evals.
  4. Cost: per-token rates match Opus 4.7, so there should be no billing surprise from the model swap alone.

FAQ

What is the Claude Opus 4.8 API model ID?

Use claude-opus-4-8 on the Claude API and Vertex AI. Use anthropic.claude-opus-4-8 on AWS Bedrock.

Is there a free tier for the Opus 4.8 API?

There is no standing free API tier, but new accounts get trial credits. See the free access guide for other low-cost paths.

How do I set the effort level?

Pass output_config: {"effort": "xhigh"} in the request. You can also use low, medium, high, or max. The default is high.

Why does my request return a 400 about budget_tokens?

Opus 4.8 does not support manual extended thinking. Remove budget_tokens and use thinking: {type: "adaptive"} with the effort parameter.

Does Opus 4.8 work with the OpenAI-compatible SDK?

Anthropic provides a compatibility layer for the OpenAI SDK. Point the base URL at the Anthropic endpoint, use your Anthropic key, and keep the model string as claude-opus-4-8.

What max_tokens should I set for agentic work?

Start at 64000 when running xhigh or max effort so the model has room to think and chain tool calls. Tune down after you measure real usage.

How do I test streaming responses in Apidog?

Open the request, enable streaming in the body, and Apidog renders the server-sent event chunks as they arrive. This makes incomplete responses easier to spot.

Top comments (0)