The Blob the LLM Could Not Parse
The API returned a list of 40 records. Each record had 12 fields. The agent stuffed the raw JSON into a tool_result block and sent it back to the model.
The model read it. Then it hallucinated a summary. The JSON was there, technically. But 40 records times 12 fields is 480 key-value pairs crammed into a single content block. The model skimmed it and guessed.
Structured data presented as raw JSON is not easy for language models to parse reliably. The format they trained on, the format they reason over well, is closer to markdown. Tables for record sets. Code blocks for config or logs. Indented text for nested structures.
tool-output-format takes your tool return values and renders them into markdown the model can actually use. It is not a pretty-printer for humans. It is a serialization layer tuned for LLM consumption.
Shape of the Fix
from tool_output_format import Formatter
fmt = Formatter()
# A list of dicts becomes a markdown table:
rows = [
{"name": "Alice", "role": "engineer", "team": "infra"},
{"name": "Bob", "role": "PM", "team": "product"},
{"name": "Carol", "role": "designer", "team": "product"},
]
output = fmt.format("list_team", rows)
print(output)
Output:
| name | role | team |
|-------|----------|---------|
| Alice | engineer | infra |
| Bob | PM | product |
| Carol | designer | product |
The column order matches the key order in the first dict. Column widths are set to the widest value in each column. No trailing spaces.
For a single dict:
record = {"id": "msg_001", "status": "delivered", "latency_ms": 142}
output = fmt.format("get_message", record)
print(output)
Output:
**get_message result**
- id: msg_001
- status: delivered
- latency_ms: 142
For a long string:
logs = "line 1\nline 2\n..." * 300 # 300 lines of logs
output = fmt.format("fetch_logs", logs)
print(output)
Output:
[Truncated to 100 lines of 300 total]
line 1
line 2
...
The truncation default is 100 lines. You can override it globally or per tool.
---
## What It Does NOT Do
It does not summarize. It does not compress using an LLM. The output is still the full data, just rendered differently. If the data is too large for the context window, use `tool-output-truncate` first, then format the truncated result.
It does not auto-detect the best format. You call `fmt.format(tool_name, result)`, and the library picks a format based on the Python type of `result`. List of dicts is always a table. Dict is always a bullet list. String is always a code block. You can override this per tool name.
It does not handle binary data, file objects, or non-serializable types. Stick to dicts, lists, strings, numbers, and booleans.
It does not produce HTML or rich text. Markdown only. If the LLM's context supports markdown rendering, this is fine. If not, the raw markdown is still readable plain text.
---
## Inside the Lib
The formatter maintains a registry keyed by tool name. If a tool name has a registered override, that formatter runs. Otherwise, type-based dispatch runs.
python
class Formatter:
def init(self, max_lines=100):
self._registry = {}
self._max_lines = max_lines
def register(self, tool_name, fn):
self._registry[tool_name] = fn
return self
def format(self, tool_name, result):
if tool_name in self._registry:
return self._registry[tool_name](result)
return self._dispatch(result)
def _dispatch(self, result):
if isinstance(result, list) and result and isinstance(result[0], dict):
return self._table(result)
if isinstance(result, dict):
return self._bullet_list(result)
if isinstance(result, str):
return self._code_block(result)
return f"
"
The _table function aligns columns without external dependencies. It measures each column's max width, pads every cell, and builds the separator row.
python
def _table(self, rows):
keys = list(rows[0].keys())
widths = {k: max(len(k), max(len(str(r.get(k, ""))) for r in rows))
for k in keys}
header = "| " + " | ".join(k.ljust(widths[k]) for k in keys) + " |"
sep = "| " + " | ".join("-" * widths[k] for k in keys) + " |"
body = "\n".join(
"| " + " | ".join(str(r.get(k, "")).ljust(widths[k]) for k in keys) + " |"
for r in rows
)
return f"{header}\n{sep}\n{body}"
No third-party dependencies. Python 3.10 and above. The whole library is under 250 lines.
The register method returns self, so you can chain registrations:
python
fmt = (Formatter()
.register("fetch_user", lambda r: f"User: {r['name']} ({r['email']})")
.register("list_errors", lambda r: "\n".join(f"- {e}" for e in r)))
When Useful / When Not
Useful when your tools return structured data and the model reasons over that data in subsequent steps. Useful when you notice the model making mistakes that look like misreading dense JSON. Useful when you want a consistent, testable serialization format for tool results across a multi-tool agent.
Not useful when the tool result is a simple string the model reads once and discards. Not useful when you need the raw JSON for another code path that parses it. In that case, format only for the model, keep the original for your code.
Not useful as a replacement for truncation. Very large tool outputs should be truncated before formatting. Formatting a 10,000-row table into markdown produces a very long markdown table.
Install
bash
pip install tool-output-format
PyPI publish is pending. Clone from GitHub in the meantime:
bash
git clone https://github.com/MukundaKatta/tool-output-format
cd tool-output-format
pip install -e .
No runtime dependencies. Run the tests with:
bash
pytest tests/
The test suite has 31 tests. They cover table alignment with unicode, single-dict formatting, string truncation at boundaries, and custom formatter registration.
Siblings
| Library | What it does | Language |
|---|---|---|
tool-output-truncate |
Trim tool output by byte or line count | Rust |
tool-output-truncate-py |
Python port of tool-output-truncate | Python |
llm-content-blocks |
Build Anthropic content block structures | Python |
tool-error-classify |
Closed ErrorKind enum for tool exceptions | Python |
tool-schema-from-fn |
Function signature to tool schema | Python |
tool-result-validator |
Validate tool output against schema | Python |
The closest sibling is tool-output-truncate-py. That library cuts output by size. This library renders what remains. They are designed to chain: truncate first, then format.
llm-content-blocks handles the Anthropic-specific content block format. If you are targeting the Anthropic API, you might use both: format the result with this library, then wrap it in a content block with llm-content-blocks.
What Is Next
v0.2.0 targets:
- Nested dict rendering. The current bullet list flattens nested dicts into a single level. A tree-style indented renderer for nested structures would be more useful.
- Configurable column order. Right now columns follow the key order of the first dict. An explicit ordering option would help when some columns are more important than others.
- Integration with
tool-output-truncate-py. A combinedformat_and_truncatehelper that applies a character budget after formatting. - CSV output mode. Some models reason better over CSV than over markdown tables for large datasets. An optional output format flag would cover this without changing the default.
The underlying insight is simple: the format you return from a tool call affects model reasoning. Raw JSON is not wrong, but it is not optimal. A little markdown goes a long way.
Pull requests welcome at MukundaKatta/tool-output-format. Part of the Hermes Agent Challenge sprint.
Top comments (0)