Mukunda Rao Katta

Posted on May 25

tool-output-truncate-py: Trim Tool Output Before It Eats Your Context Window

#hermeschallenge #ai #python #agents

I gave an agent a file-reading tool. The agent read a log file. The log file was 180 kilobytes. The agent sent that entire 180KB string back to the model as a tool result. The model's context filled up and the call failed. The agent had no plan for that, so it crashed.

The obvious fix is to limit what the file-reading tool returns. I added a [:4000] slice at the end of the tool function. That worked until the agent read a file with multibyte UTF-8 characters and I sliced in the middle of a character, producing a string that crashed the JSON encoder with a UnicodeEncodeError. So I switched to encoding first, slicing bytes, and decoding back. That worked for head-truncation. But then I had a case where the important information was at the end of the file, not the start, so head-truncation was throwing away the part I needed. I added a tail mode. Then I wanted to keep both the start and end of a large API response and drop the middle. I added a middle mode. Each of these was written in a slightly different spot in the codebase by slightly different people.

There are four reasonable truncation strategies for tool output. They are not hard to write individually, but they show up in every agent codebase that uses tools returning large strings, and writing them correctly (UTF-8 safe, no mid-character cuts, correct marker text, line-aware variants) takes more care than it looks. tool-output-truncate-py packages all four in one place.

Shape of the fix

from tool_output_truncate_py import truncate, Strategy

big_output = fetch_file_content("/var/log/syslog")  # 200KB string

# Keep the first 4000 chars (default strategy):
short = truncate(big_output, max_chars=4000, strategy=Strategy.HEAD)

# Keep the last 4000 chars:
short = truncate(big_output, max_chars=4000, strategy=Strategy.TAIL)

# Keep first 2000 and last 2000 chars, cut the middle:
short = truncate(big_output, max_chars=4000, strategy=Strategy.MIDDLE)
# Output: "<first 2000 chars>\n... [truncated 196000 chars] ...\n<last 2000 chars>"

# Line-aware middle: same logic, but split and rejoin on newlines
# so you never get a truncated partial line at the boundary:
short = truncate(big_output, max_chars=4000, strategy=Strategy.MIDDLE_LINES)

# Custom marker text for the cut point:
short = truncate(
    big_output,
    max_chars=4000,
    strategy=Strategy.MIDDLE,
    marker="[... {count} chars removed ...]"
)
# {count} is replaced with the actual number of removed characters

# Already short enough: returns the string unchanged, no marker added
short = truncate("hello", max_chars=4000, strategy=Strategy.HEAD)
assert short == "hello"

All four strategies are UTF-8 safe: truncation boundaries are always calculated in characters (Python str), not bytes, so you never produce a broken character at a cut point. The marker text is included in the character count so the output never exceeds max_chars.

What it does NOT do

It does not decide which strategy to use for you. You pass the strategy. If you want to auto-select based on content type (structured JSON, logs, prose), that logic belongs in your tool function, not here. It does not compress or summarize: it cuts. If you need the model to see a summary of a large output rather than a truncated version, use an LLM to summarize before passing the result back. It does not chunk: if you want to split a large output into multiple tool result calls, that is also not here. It truncates to one string under the limit. It does not handle binary content. Pass it a Python str. If you have bytes, decode first.

Inside the lib

All four strategies encode the target string as UTF-8 internally to count characters correctly, but they operate on the Python str object directly using character indices. Python str is already Unicode code-point indexed, so s[n] is always a whole character, never a byte in the middle of a multibyte sequence. The UTF-8 encoding is only used to verify that no encoder errors would occur on the result; the actual slicing is index-based.

HEAD and TAIL are single cuts. They slice at max_chars - len(marker) to reserve space for the marker and then append it. MIDDLE splits the character budget: half to the head portion and half to the tail portion, then joins them with the marker in between. The marker's {count} placeholder is replaced with the exact character count of the removed middle section before the join.

MIDDLE_LINES does the same split as MIDDLE, but instead of cutting at a character index it splits the string on \n first and then takes whole lines from the head and tail respectively until the character budget is exhausted. This avoids the case where a MIDDLE cut lands in the middle of a log line and the model sees a broken partial line at each boundary. It is slower than MIDDLE because it does two split("\n") calls, but for log output and structured line-oriented text the readability improvement is worth the cost.

26 tests cover: all four strategies on short strings (no truncation), all four on strings that require truncation, the {count} marker substitution, multibyte character safety at boundaries, marker length included in the output size guarantee, already-at-limit strings, empty strings, and the custom marker API.

When useful

File-reading tools where the file might be arbitrarily large and you want to give the model the start and end without feeding the whole thing
API tools that return verbose JSON where you want to keep the structure envelope (head) and the last few results (tail) and drop the middle
Log-reading tools where MIDDLE_LINES keeps you from cutting in the middle of a structured log line
Any codebase where multiple tools each have their own hand-rolled truncation with slightly different behavior
Wrapping tool calls in a dispatch layer: add one truncate call in the tool result path and all tools get consistent output limits

When not useful

Binary data or bytes: decode to str first
Situations where you need the model to see a coherent summary of the full content: truncation is not summarization
Tools where the output is always short: the overhead of calling truncate is trivial but pointless if you know the output fits
Cases where you want to split the output across multiple tool result messages: use a chunking library for that

Install

pip install tool-output-truncate-py

Zero dependencies. Python 3.9+. No external packages. Published at PyPI v0.1.0 on 2026-05-24.

Siblings

Library	Language	What it does
tool-output-truncate	Rust	Original Rust implementation with the same four strategies
tool-output-format	Python	Render tool output as LLM-friendly markdown before truncating
llm-content-blocks	Python	Build Anthropic content-block arrays with tool results
agentfit	Python/npm	Agent-level context budget tracking and alerting
agent-message-window	Python	Sliding message window that pairs tool_use and tool_result

What's next

A chars_removed field on the return value would make it easy to log how often truncation is actually happening and by how much. Right now you can compute it from the input and output lengths, but having it returned directly saves the caller one subtraction and makes it explicit that truncation occurred. A token_budget variant that estimates token count instead of character count is also worth considering, since character count and token count are not 1:1 for most models. For now, character count is the safe, dependency-free baseline that works for all providers.

Part of the Hermes Agent Challenge sprint. Source at github.com/MukundaKatta/tool-output-truncate-py. PyPI: pip install tool-output-truncate-py.

DEV Community