DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

Claude's extended thinking is useful. Sending raw <thinking> tags to users is not.

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge.

I turned on Claude's extended thinking for my Hermes research agent. The reasoning quality improved noticeably — longer chains of thought, better error detection, fewer hallucinated citations. But I was passing the raw response content downstream to a formatting step, and suddenly I was seeing <thinking> tags in places they shouldn't appear.

The fix is a clean separation: extract the thinking, use it internally if needed, strip it before it goes anywhere else.

One function, clean output

from llm_thought_strip import strip_thoughts

text = """
<thinking>
Let me reason through this carefully.
Step 1: Check the sources.
Step 2: Cross-reference dates.
</thinking>

Based on my analysis, the answer is Paris.
"""

result = strip_thoughts(text)
print(result.text)
# "Based on my analysis, the answer is Paris."
Enter fullscreen mode Exit fullscreen mode

The thinking is gone from the output. But it's not lost:

print(result.thoughts)
# ["Let me reason through this carefully.\nStep 1: Check the sources.\nStep 2: Cross-reference dates."]

print(result.stripped_count)  # 1
print(result.had_thoughts)    # True
Enter fullscreen mode Exit fullscreen mode

I log the extracted thoughts to my trace file. They're not shown to the user, but they're there for debugging and auditing.

Works on Anthropic content blocks

Claude's extended thinking mode sends thinking as native content blocks — {"type": "thinking", "thinking": "..."}. The library handles those too:

from llm_thought_strip import strip_thoughts_from_blocks

response_content = [
    {"type": "thinking", "thinking": "internal step-by-step reasoning..."},
    {"type": "text", "text": "The answer is 42."},
]

cleaned, thoughts = strip_thoughts_from_blocks(response_content)
# cleaned = [{"type": "text", "text": "The answer is 42."}]
# thoughts = ["internal step-by-step reasoning..."]
Enter fullscreen mode Exit fullscreen mode

Native thinking blocks are removed entirely. Inline <thinking> tags inside text blocks are also stripped.

One-liner for the simple case

from llm_thought_strip import visible_text

clean = visible_text(response.text)
# Just the text, no thinking blocks, no wrapper object
Enter fullscreen mode Exit fullscreen mode

Tags it handles

By default: <thinking>, <think>, <scratchpad>, <reasoning>, <thought> — all case-insensitive. For a model that uses different tags:

result = strip_thoughts(text, tags=("internal", "scratch", "think"))
Enter fullscreen mode Exit fullscreen mode

Whitespace cleanup

After stripping, multiple consecutive blank lines are collapsed to one. A response that was:

<thinking>long thought</thinking>


The answer...
Enter fullscreen mode Exit fullscreen mode

Becomes just:

The answer...
Enter fullscreen mode Exit fullscreen mode

Use keep_whitespace=True to disable this.

How I use it in my Hermes agent

response = client.messages.create(
    model="claude-sonnet-4-6",
    thinking={"type": "enabled", "budget_tokens": 2000},
    messages=messages,
    max_tokens=4096,
)

# Strip thinking for downstream use
cleaned_blocks, thoughts = strip_thoughts_from_blocks(response.content)

# Log thoughts to trace (for debugging only)
trace.log("thinking", {"thoughts": thoughts, "turn": turn})

# Use clean blocks for the visible response
visible = " ".join(b["text"] for b in cleaned_blocks if b.get("type") == "text")
send_to_user(visible)
Enter fullscreen mode Exit fullscreen mode

Thinking is preserved in the trace. Users never see it.

Zero dependencies

Standard library only: re, dataclasses. No third-party packages.

pip install llm-thought-strip
Enter fullscreen mode Exit fullscreen mode

Repo: https://github.com/MukundaKatta/llm-thought-strip

Top comments (0)