DEV Community

Cover image for Extended Thinking Returns Two Blocks, Not One (Anthropic Academy Part 2)
Yurukusa
Yurukusa

Posted on

Extended Thinking Returns Two Blocks, Not One (Anthropic Academy Part 2)

Part 2 of what Anthropic Academy's quizzes taught me I didn't know. Part 1 was about caching.

This one almost broke my streaming parser.

Two Blocks, Not One

I'd been treating Extended Thinking responses like regular responses. Parse the text, move on.

But Extended Thinking returns two distinct blocks: a thinking block (the model's internal reasoning) and a text block (the final answer). They're separate content items in the response array.

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me reason through this step by step...",
      "signature": "eyJhbGciOiJFZDI1NTE5..."
    },
    {
      "type": "text",
      "text": "The answer is 42."
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This matters for streaming. You get thinking_delta events first, then text_delta events. If your streaming parser doesn't distinguish the block types, you'll concatenate raw reasoning into your user-facing output.

My code happened to only read the last content block. So it worked — by accident. Lucky, not correct.

The Signature Is a Tamper Seal

Notice that signature field in the thinking block? It's a cryptographic token. If you want to send this message back to Claude as part of the conversation history, Claude verifies that you haven't modified the thinking text.

Why? Because Claude relies heavily on its previous thinking during response generation. If developers could modify the thinking text, they could steer Claude in unsafe directions. The signature prevents that.

This has a practical implication: you can't edit thinking blocks. If you're building a conversation UI that lets users edit messages, thinking blocks need to be treated as read-only.

Redacted Thinking Is a Thing

Sometimes you'll get a thinking block with no text — just encrypted data:

{
  "type": "redacted_thinking",
  "data": "encrypted_content_here..."
}
Enter fullscreen mode Exit fullscreen mode

This happens when Claude's internal safety systems flag the thinking content. The encrypted data is provided so you can include it in follow-up messages without Claude losing context. It can't read the encrypted text either — it just uses it as a context signal.

For testing, there's a magic string you can send to force a redacted thinking response. Useful for making sure your app doesn't crash when it receives one unexpectedly.

The Budget Is a Minimum, Not a Target

Extended Thinking takes a budget_tokens parameter — the maximum tokens Claude can spend thinking. The minimum is 1,024 tokens. Claude might use fewer, but you can't set the budget below 1,024.

Here's the constraint I missed: max_tokens must be greater than budget_tokens. If your thinking budget is 1,024, max_tokens must be at least 1,025 — leaving exactly 1 token for the actual response. Not useful.

In practice, set max_tokens significantly higher than budget_tokens. A thinking budget of 1,024 with max_tokens of 4,000 gives 3,000 tokens for the actual response.

When to Enable It

The course's advice was surprisingly restrained: don't reach for Extended Thinking first. Improve your prompt. Run evals. If accuracy still isn't where you need it, then consider enabling thinking.

Extended Thinking tokens are charged as output tokens. They add latency. They add cost. For straightforward tasks, they're overhead without benefit.

One more thing: when Extended Thinking is enabled, temperature is locked at 1.0. You can't change it. The thinking process needs the full probability distribution to reason effectively.


Next in the series: RAG re-ranking isn't what you think it is.

Anthropic Academy is free: anthropic.skilljar.com

Related:
🛠 Free: claude-code-hooks — 16 production hooks, open source.

Running Claude Code autonomously? Claude Code Ops Kit ($19) — 16 hooks + 5 templates + 3 tools. Production-ready in 15 minutes.

What's your experience with Extended Thinking? When has it actually helped versus added unnecessary latency?

Top comments (0)