tool-output-format: Render Tool Results as LLM-Friendly Markdown

#hermeschallenge #ai #python #agents

The Blob the LLM Could Not Parse

The API returned a list of 40 records. Each record had 12 fields. The agent stuffed the raw JSON into a tool_result block and sent it back to the model.

The model read it. Then it hallucinated a summary. The JSON was there, technically. But 40 records times 12 fields is 480 key-value pairs crammed into a single content block. The model skimmed it and guessed.

Structured data presented as raw JSON is not easy for language models to parse reliably. The format they trained on, the format they reason over well, is closer to markdown. Tables for record sets. Code blocks for config or logs. Indented text for nested structures.

tool-output-format takes your tool return values and renders them into markdown the model can actually use. It is not a pretty-printer for humans. It is a serialization layer tuned for LLM consumption.

Shape of the Fix

from tool_output_format import Formatter

fmt = Formatter()

# A list of dicts becomes a markdown table:
rows = [
    {"name": "Alice", "role": "engineer", "team": "infra"},
    {"name": "Bob", "role": "PM", "team": "product"},
    {"name": "Carol", "role": "designer", "team": "product"},
]

output = fmt.format("list_team", rows)
print(output)

Output:

| name  | role     | team    |
|-------|----------|---------|
| Alice | engineer | infra   |
| Bob   | PM       | product |
| Carol | designer | product |

The column order matches the key order in the first dict. Column widths are set to the widest value in each column. No trailing spaces.

For a single dict:

record = {"id": "msg_001", "status": "delivered", "latency_ms": 142}
output = fmt.format("get_message", record)
print(output)

Output:

**get_message result**

- id: msg_001
- status: delivered
- latency_ms: 142

For a long string:

logs = "line 1\nline 2\n..." * 300  # 300 lines of logs
output = fmt.format("fetch_logs", logs)
print(output)

Output:

[Truncated to 100 lines of 300 total]

line 1
line 2
...


The truncation default is 100 lines. You can override it globally or per tool.

---

## What It Does NOT Do

It does not summarize. It does not compress using an LLM. The output is still the full data, just rendered differently. If the data is too large for the context window, use `tool-output-truncate` first, then format the truncated result.

It does not auto-detect the best format. You call `fmt.format(tool_name, result)`, and the library picks a format based on the Python type of `result`. List of dicts is always a table. Dict is always a bullet list. String is always a code block. You can override this per tool name.

It does not handle binary data, file objects, or non-serializable types. Stick to dicts, lists, strings, numbers, and booleans.

It does not produce HTML or rich text. Markdown only. If the LLM's context supports markdown rendering, this is fine. If not, the raw markdown is still readable plain text.

---

## Inside the Lib

The formatter maintains a registry keyed by tool name. If a tool name has a registered override, that formatter runs. Otherwise, type-based dispatch runs.

python
class Formatter:
def init(self, max_lines=100):
self._registry = {}
self._max_lines = max_lines

def register(self, tool_name, fn):
    self._registry[tool_name] = fn
    return self

def format(self, tool_name, result):
    if tool_name in self._registry:
        return self._registry[tool_name](result)
    return self._dispatch(result)

def _dispatch(self, result):
    if isinstance(result, list) and result and isinstance(result[0], dict):
        return self._table(result)
    if isinstance(result, dict):
        return self._bullet_list(result)
    if isinstance(result, str):
        return self._code_block(result)
    return f"

The _table function aligns columns without external dependencies. It measures each column's max width, pads every cell, and builds the separator row.


python
def _table(self, rows):
    keys = list(rows[0].keys())
    widths = {k: max(len(k), max(len(str(r.get(k, ""))) for r in rows))
              for k in keys}
    header = "| " + " | ".join(k.ljust(widths[k]) for k in keys) + " |"
    sep = "| " + " | ".join("-" * widths[k] for k in keys) + " |"
    body = "\n".join(
        "| " + " | ".join(str(r.get(k, "")).ljust(widths[k]) for k in keys) + " |"
        for r in rows
    )
    return f"{header}\n{sep}\n{body}"

No third-party dependencies. Python 3.10 and above. The whole library is under 250 lines.

The register method returns self, so you can chain registrations:


python
fmt = (Formatter()
    .register("fetch_user", lambda r: f"User: {r['name']} ({r['email']})")
    .register("list_errors", lambda r: "\n".join(f"- {e}" for e in r)))

When Useful / When Not

Useful when your tools return structured data and the model reasons over that data in subsequent steps. Useful when you notice the model making mistakes that look like misreading dense JSON. Useful when you want a consistent, testable serialization format for tool results across a multi-tool agent.

Not useful when the tool result is a simple string the model reads once and discards. Not useful when you need the raw JSON for another code path that parses it. In that case, format only for the model, keep the original for your code.

Not useful as a replacement for truncation. Very large tool outputs should be truncated before formatting. Formatting a 10,000-row table into markdown produces a very long markdown table.

Install


bash
pip install tool-output-format

PyPI publish is pending. Clone from GitHub in the meantime:


bash
git clone https://github.com/MukundaKatta/tool-output-format
cd tool-output-format
pip install -e .

No runtime dependencies. Run the tests with:


bash
pytest tests/

The test suite has 31 tests. They cover table alignment with unicode, single-dict formatting, string truncation at boundaries, and custom formatter registration.

Siblings

Library	What it does	Language
`tool-output-truncate`	Trim tool output by byte or line count	Rust
`tool-output-truncate-py`	Python port of tool-output-truncate	Python
`llm-content-blocks`	Build Anthropic content block structures	Python
`tool-error-classify`	Closed ErrorKind enum for tool exceptions	Python
`tool-schema-from-fn`	Function signature to tool schema	Python
`tool-result-validator`	Validate tool output against schema	Python

The closest sibling is tool-output-truncate-py. That library cuts output by size. This library renders what remains. They are designed to chain: truncate first, then format.

llm-content-blocks handles the Anthropic-specific content block format. If you are targeting the Anthropic API, you might use both: format the result with this library, then wrap it in a content block with llm-content-blocks.

What Is Next

v0.2.0 targets:

Nested dict rendering. The current bullet list flattens nested dicts into a single level. A tree-style indented renderer for nested structures would be more useful.
Configurable column order. Right now columns follow the key order of the first dict. An explicit ordering option would help when some columns are more important than others.
Integration with tool-output-truncate-py. A combined format_and_truncate helper that applies a character budget after formatting.
CSV output mode. Some models reason better over CSV than over markdown tables for large datasets. An optional output format flag would cover this without changing the default.

The underlying insight is simple: the format you return from a tool call affects model reasoning. Raw JSON is not wrong, but it is not optimal. A little markdown goes a long way.

Pull requests welcome at MukundaKatta/tool-output-format. Part of the Hermes Agent Challenge sprint.