Sangmin Lee

Posted on May 30 • Originally published at claudeguide.io

Claude Extended Thinking: When Opus Is Worth the Extra Cost

#opus

Originally published at claudeguide.io/claude-extended-thinking-when

Claude Extended Thinking: When Opus Is Worth the Extra Cost

Use extended thinking when accuracy on a complex, multi-step problem matters more than cost — specifically for advanced math, algorithmic design, legal or financial analysis, and architectural decisions where reasoning depth directly changes the quality of the output. For simple lookups, classification, and routine code generation, turn it off in 2026.

What Is Extended Thinking?

Extended thinking gives Claude a private scratchpad. Before producing a final response, the model works through the problem step by step — exploring alternatives, catching contradictions, and verifying its own logic — in a hidden reasoning block that you never see in the output. The result is a more deliberate answer, at the cost of more tokens.

The feature is available on Claude Opus 4 and Claude Sonnet 4 via the thinking parameter in the Messages API. You control the maximum thinking budget with budget_tokens. Thinking tokens are billed at the same rate as regular output tokens, so cost scales directly with how much reasoning you allow.

According to Anthropic's internal evals, extended thinking improves performance on graduate-level STEM benchmarks by 10–25 percentage points compared to the same model without thinking enabled — a gap that compresses to near-zero on straightforward tasks.

How Extended Thinking Works Under the Hood

When you pass "type": "enabled" in the thinking object, Claude allocates up to budget_tokens tokens for its internal reasoning chain before generating the visible response. The scratchpad content is discarded from the API response by default — you see only the final answer.

Key mechanics to know:

Minimum budget: 1,024 tokens. Setting budget_tokens below this value raises an error.
Hard cap: The sum of budget_tokens plus the expected output length must fit within the model's context window.
Streaming: Thinking blocks stream as thinking delta events. If you only need the final text, filter them out.
Temperature: During thinking, temperature is fixed at 1. Your temperature parameter applies only to the final response generation.
Prompt caching: Thinking blocks are not eligible for cache reads, but the prompt before the thinking block is. Cache your system prompt aggressively to offset the cost.

Pricing Reality: What Extended Thinking Actually Costs

Claude Opus 4 pricing as of April 2026:

Token type	Price per million tokens
Input	$15
Output (including thinking)	$75

A request that generates 2,000 thinking tokens + 500 output tokens costs the same as a request that generates 2,500 output tokens directly. There is no surcharge for enabling extended thinking — you pay for what you use.

The practical concern is that complex problems often require 5,000–20,000 thinking tokens to show real quality gains. At $75/M output tokens, 10,000 thinking tokens costs $0.75 per request. Run 1,000 such requests per day and extended thinking alone adds $750/day — roughly $22,500/month — on top of your base inference cost.

That math makes the decision straightforward: extended thinking is worth it only when the quality improvement converts to revenue or avoids a costly mistake that exceeds the token spend.

When Extended Thinking Is Worth It

1. Advanced Mathematics and Quantitative Reasoning

On the AIME (American Invitational Mathematics Examination), Claude Opus 4 with extended thinking scores in the 85th–90th percentile range. Without extended thinking, the same model scores roughly 20–30 points lower on hard problem sets. For any application where your users are solving multi-step calculus, combinatorics, or optimization problems — tutoring platforms, quantitative finance tools, engineering calculators — extended thinking earns its cost.

2. Code Architecture and System Design

Asking Claude to design a distributed event-sourcing system, pick between two ORM strategies, or refactor a 5,000-line module for testability benefits significantly from thinking. The model evaluates trade-offs, considers failure modes, and identifies edge cases before committing to a recommendation. In a study of 500 architecture reviews run through Claude with and without extended thinking, the thinking-enabled responses contained 40% fewer unaddressed failure modes flagged by senior engineers in review.

3. Legal and Contractual Analysis

Identifying ambiguous indemnification clauses, checking cross-jurisdictional compliance, or summarizing a 100-page contract with hidden carve-outs demands the kind of cross-referencing that extended thinking handles well. Each missed clause can cost far more than the $0.50–$2.00 per document the thinking tokens add.

4. Multi-Step Research Synthesis

When Claude must compare five competing sources, identify contradictions, weight evidence, and produce a coherent conclusion — academic literature reviews, competitive intelligence reports, due diligence memos — thinking tokens produce noticeably better synthesis than standard inference.

5. High-Stakes Code Generation

Routine CRUD endpoints: skip thinking. Security-critical authentication flows, cryptographic implementations, or financial calculation engines: enable it. A 2024 analysis of LLM-generated code showed that models using chain-of-thought reasoning introduced 35% fewer security vulnerabilities in authentication code versus greedy decoding.

When Extended Thinking Is NOT Worth It

Most tasks do not benefit enough to justify the cost. Skip extended thinking for:

Simple Q&A and lookups: "What is the capital of France?" needs no scratchpad.
Classification and labeling: Sentiment analysis, intent detection, topic tagging — standard inference is equally accurate.
Data extraction: Pulling structured fields from documents (name, date, amount) is a pattern-matching task. Extended thinking adds latency and cost, not accuracy.
Routine code generation: Generating boilerplate, writing SQL from a schema, adding docstrings — these are well within standard Sonnet capabilities without thinking.
Translation: Neural translation quality is not a reasoning problem. Extended thinking does not improve BLEU scores.
Summarization of straightforward documents: If the document has one clear main point, the thinking budget goes unused or produces padding.

A useful heuristic: if a competent human could answer the question correctly in under 30 seconds without a scratchpad, Claude does not need one either.

Extended Thinking vs. Regular Opus vs. Sonnet: Decision Tree



Is the task complex and multi-step?
├── No → Use claude-sonnet-4-5 (fast, cheap, sufficient)
└── Yes
    ├── Does accuracy have high stakes (cost of error 

PDF guide + Excel cost calculator.

[→ Get Cost Optimization Masterclass — $59](https://shoutfirst.gumroad.com/l/msjkda?utm_source=claudeguide&utm_medium=article&utm_campaign=claude-extended-thinking-when)

*30-day money-back guarantee. Instant download.*

DEV Community

Claude Extended Thinking: When Opus Is Worth the Extra Cost

Claude Extended Thinking: When Opus Is Worth the Extra Cost

What Is Extended Thinking?

How Extended Thinking Works Under the Hood

Pricing Reality: What Extended Thinking Actually Costs

When Extended Thinking Is Worth It

1. Advanced Mathematics and Quantitative Reasoning

2. Code Architecture and System Design

3. Legal and Contractual Analysis

4. Multi-Step Research Synthesis

5. High-Stakes Code Generation

When Extended Thinking Is NOT Worth It

Extended Thinking vs. Regular Opus vs. Sonnet: Decision Tree

Top comments (0)