Claude's extended thinking is useful. Sending raw <thinking> tags to users is not.

#hermesagentchallenge #devchallenge #agents #python

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge.

I turned on Claude's extended thinking for my Hermes research agent. The reasoning quality improved noticeably — longer chains of thought, better error detection, fewer hallucinated citations. But I was passing the raw response content downstream to a formatting step, and suddenly I was seeing <thinking> tags in places they shouldn't appear.

The fix is a clean separation: extract the thinking, use it internally if needed, strip it before it goes anywhere else.

One function, clean output

from llm_thought_strip import strip_thoughts

text = """
<thinking>
Let me reason through this carefully.
Step 1: Check the sources.
Step 2: Cross-reference dates.
</thinking>

Based on my analysis, the answer is Paris.
"""

result = strip_thoughts(text)
print(result.text)
# "Based on my analysis, the answer is Paris."

The thinking is gone from the output. But it's not lost:

print(result.thoughts)
# ["Let me reason through this carefully.\nStep 1: Check the sources.\nStep 2: Cross-reference dates."]

print(result.stripped_count)  # 1
print(result.had_thoughts)    # True

I log the extracted thoughts to my trace file. They're not shown to the user, but they're there for debugging and auditing.

Works on Anthropic content blocks

Claude's extended thinking mode sends thinking as native content blocks — {"type": "thinking", "thinking": "..."}. The library handles those too:

from llm_thought_strip import strip_thoughts_from_blocks

response_content = [
    {"type": "thinking", "thinking": "internal step-by-step reasoning..."},
    {"type": "text", "text": "The answer is 42."},
]

cleaned, thoughts = strip_thoughts_from_blocks(response_content)
# cleaned = [{"type": "text", "text": "The answer is 42."}]
# thoughts = ["internal step-by-step reasoning..."]

Native thinking blocks are removed entirely. Inline <thinking> tags inside text blocks are also stripped.

One-liner for the simple case

from llm_thought_strip import visible_text

clean = visible_text(response.text)
# Just the text, no thinking blocks, no wrapper object

Tags it handles

By default: <thinking>, <think>, <scratchpad>, <reasoning>, <thought> — all case-insensitive. For a model that uses different tags:

result = strip_thoughts(text, tags=("internal", "scratch", "think"))

Whitespace cleanup

After stripping, multiple consecutive blank lines are collapsed to one. A response that was:

<thinking>long thought</thinking>


The answer...

Becomes just:

The answer...

Use keep_whitespace=True to disable this.

How I use it in my Hermes agent

response = client.messages.create(
    model="claude-sonnet-4-6",
    thinking={"type": "enabled", "budget_tokens": 2000},
    messages=messages,
    max_tokens=4096,
)

# Strip thinking for downstream use
cleaned_blocks, thoughts = strip_thoughts_from_blocks(response.content)

# Log thoughts to trace (for debugging only)
trace.log("thinking", {"thoughts": thoughts, "turn": turn})

# Use clean blocks for the visible response
visible = " ".join(b["text"] for b in cleaned_blocks if b.get("type") == "text")
send_to_user(visible)