I gave an agent a file-reading tool. The agent read a log file. The log file was 180 kilobytes. The agent sent that entire 180KB string back to the model as a tool result. The model's context filled up and the call failed. The agent had no plan for that, so it crashed.
The obvious fix is to limit what the file-reading tool returns. I added a [:4000] slice at the end of the tool function. That worked until the agent read a file with multibyte UTF-8 characters and I sliced in the middle of a character, producing a string that crashed the JSON encoder with a UnicodeEncodeError. So I switched to encoding first, slicing bytes, and decoding back. That worked for head-truncation. But then I had a case where the important information was at the end of the file, not the start, so head-truncation was throwing away the part I needed. I added a tail mode. Then I wanted to keep both the start and end of a large API response and drop the middle. I added a middle mode. Each of these was written in a slightly different spot in the codebase by slightly different people.
There are four reasonable truncation strategies for tool output. They are not hard to write individually, but they show up in every agent codebase that uses tools returning large strings, and writing them correctly (UTF-8 safe, no mid-character cuts, correct marker text, line-aware variants) takes more care than it looks. tool-output-truncate-py packages all four in one place.
Shape of the fix
from tool_output_truncate_py import truncate, Strategy
big_output = fetch_file_content("/var/log/syslog") # 200KB string
# Keep the first 4000 chars (default strategy):
short = truncate(big_output, max_chars=4000, strategy=Strategy.HEAD)
# Keep the last 4000 chars:
short = truncate(big_output, max_chars=4000, strategy=Strategy.TAIL)
# Keep first 2000 and last 2000 chars, cut the middle:
short = truncate(big_output, max_chars=4000, strategy=Strategy.MIDDLE)
# Output: "<first 2000 chars>\n... [truncated 196000 chars] ...\n<last 2000 chars>"
# Line-aware middle: same logic, but split and rejoin on newlines
# so you never get a truncated partial line at the boundary:
short = truncate(big_output, max_chars=4000, strategy=Strategy.MIDDLE_LINES)
# Custom marker text for the cut point:
short = truncate(
big_output,
max_chars=4000,
strategy=Strategy.MIDDLE,
marker="[... {count} chars removed ...]"
)
# {count} is replaced with the actual number of removed characters
# Already short enough: returns the string unchanged, no marker added
short = truncate("hello", max_chars=4000, strategy=Strategy.HEAD)
assert short == "hello"
All four strategies are UTF-8 safe: truncation boundaries are always calculated in characters (Python str), not bytes, so you never produce a broken character at a cut point. The marker text is included in the character count so the output never exceeds max_chars.
What it does NOT do
It does not decide which strategy to use for you. You pass the strategy. If you want to auto-select based on content type (structured JSON, logs, prose), that logic belongs in your tool function, not here. It does not compress or summarize: it cuts. If you need the model to see a summary of a large output rather than a truncated version, use an LLM to summarize before passing the result back. It does not chunk: if you want to split a large output into multiple tool result calls, that is also not here. It truncates to one string under the limit. It does not handle binary content. Pass it a Python str. If you have bytes, decode first.
Inside the lib
All four strategies encode the target string as UTF-8 internally to count characters correctly, but they operate on the Python str object directly using character indices. Python str is already Unicode code-point indexed, so s[n] is always a whole character, never a byte in the middle of a multibyte sequence. The UTF-8 encoding is only used to verify that no encoder errors would occur on the result; the actual slicing is index-based.
HEAD and TAIL are single cuts. They slice at max_chars - len(marker) to reserve space for the marker and then append it. MIDDLE splits the character budget: half to the head portion and half to the tail portion, then joins them with the marker in between. The marker's {count} placeholder is replaced with the exact character count of the removed middle section before the join.
MIDDLE_LINES does the same split as MIDDLE, but instead of cutting at a character index it splits the string on \n first and then takes whole lines from the head and tail respectively until the character budget is exhausted. This avoids the case where a MIDDLE cut lands in the middle of a log line and the model sees a broken partial line at each boundary. It is slower than MIDDLE because it does two split("\n") calls, but for log output and structured line-oriented text the readability improvement is worth the cost.
26 tests cover: all four strategies on short strings (no truncation), all four on strings that require truncation, the {count} marker substitution, multibyte character safety at boundaries, marker length included in the output size guarantee, already-at-limit strings, empty strings, and the custom marker API.
When useful
- File-reading tools where the file might be arbitrarily large and you want to give the model the start and end without feeding the whole thing
- API tools that return verbose JSON where you want to keep the structure envelope (head) and the last few results (tail) and drop the middle
- Log-reading tools where
MIDDLE_LINESkeeps you from cutting in the middle of a structured log line - Any codebase where multiple tools each have their own hand-rolled truncation with slightly different behavior
- Wrapping tool calls in a dispatch layer: add one
truncatecall in the tool result path and all tools get consistent output limits
When not useful
- Binary data or bytes: decode to str first
- Situations where you need the model to see a coherent summary of the full content: truncation is not summarization
- Tools where the output is always short: the overhead of calling truncate is trivial but pointless if you know the output fits
- Cases where you want to split the output across multiple tool result messages: use a chunking library for that
Install
pip install tool-output-truncate-py
Zero dependencies. Python 3.9+. No external packages. Published at PyPI v0.1.0 on 2026-05-24.
Siblings
| Library | Language | What it does |
|---|---|---|
| tool-output-truncate | Rust | Original Rust implementation with the same four strategies |
| tool-output-format | Python | Render tool output as LLM-friendly markdown before truncating |
| llm-content-blocks | Python | Build Anthropic content-block arrays with tool results |
| agentfit | Python/npm | Agent-level context budget tracking and alerting |
| agent-message-window | Python | Sliding message window that pairs tool_use and tool_result |
What's next
A chars_removed field on the return value would make it easy to log how often truncation is actually happening and by how much. Right now you can compute it from the input and output lengths, but having it returned directly saves the caller one subtraction and makes it explicit that truncation occurred. A token_budget variant that estimates token count instead of character count is also worth considering, since character count and token count are not 1:1 for most models. For now, character count is the safe, dependency-free baseline that works for all providers.
Part of the Hermes Agent Challenge sprint. Source at github.com/MukundaKatta/tool-output-truncate-py. PyPI: pip install tool-output-truncate-py.
Top comments (0)