This is a submission for the Hermes Agent Challenge.
I turned on Claude's extended thinking for my Hermes research agent. The reasoning quality improved noticeably — longer chains of thought, better error detection, fewer hallucinated citations. But I was passing the raw response content downstream to a formatting step, and suddenly I was seeing <thinking> tags in places they shouldn't appear.
The fix is a clean separation: extract the thinking, use it internally if needed, strip it before it goes anywhere else.
One function, clean output
from llm_thought_strip import strip_thoughts
text = """
<thinking>
Let me reason through this carefully.
Step 1: Check the sources.
Step 2: Cross-reference dates.
</thinking>
Based on my analysis, the answer is Paris.
"""
result = strip_thoughts(text)
print(result.text)
# "Based on my analysis, the answer is Paris."
The thinking is gone from the output. But it's not lost:
print(result.thoughts)
# ["Let me reason through this carefully.\nStep 1: Check the sources.\nStep 2: Cross-reference dates."]
print(result.stripped_count) # 1
print(result.had_thoughts) # True
I log the extracted thoughts to my trace file. They're not shown to the user, but they're there for debugging and auditing.
Works on Anthropic content blocks
Claude's extended thinking mode sends thinking as native content blocks — {"type": "thinking", "thinking": "..."}. The library handles those too:
from llm_thought_strip import strip_thoughts_from_blocks
response_content = [
{"type": "thinking", "thinking": "internal step-by-step reasoning..."},
{"type": "text", "text": "The answer is 42."},
]
cleaned, thoughts = strip_thoughts_from_blocks(response_content)
# cleaned = [{"type": "text", "text": "The answer is 42."}]
# thoughts = ["internal step-by-step reasoning..."]
Native thinking blocks are removed entirely. Inline <thinking> tags inside text blocks are also stripped.
One-liner for the simple case
from llm_thought_strip import visible_text
clean = visible_text(response.text)
# Just the text, no thinking blocks, no wrapper object
Tags it handles
By default: <thinking>, <think>, <scratchpad>, <reasoning>, <thought> — all case-insensitive. For a model that uses different tags:
result = strip_thoughts(text, tags=("internal", "scratch", "think"))
Whitespace cleanup
After stripping, multiple consecutive blank lines are collapsed to one. A response that was:
<thinking>long thought</thinking>
The answer...
Becomes just:
The answer...
Use keep_whitespace=True to disable this.
How I use it in my Hermes agent
response = client.messages.create(
model="claude-sonnet-4-6",
thinking={"type": "enabled", "budget_tokens": 2000},
messages=messages,
max_tokens=4096,
)
# Strip thinking for downstream use
cleaned_blocks, thoughts = strip_thoughts_from_blocks(response.content)
# Log thoughts to trace (for debugging only)
trace.log("thinking", {"thoughts": thoughts, "turn": turn})
# Use clean blocks for the visible response
visible = " ".join(b["text"] for b in cleaned_blocks if b.get("type") == "text")
send_to_user(visible)
Thinking is preserved in the trace. Users never see it.
Zero dependencies
Standard library only: re, dataclasses. No third-party packages.
pip install llm-thought-strip
Top comments (0)