Mukunda Rao Katta

Posted on May 25

Format Tool Output for LLMs Before It Becomes a Tool Result: tool-output-format

#hermeschallenge #ai #python #agents

The search tool returned a result. The model's next turn was not to use it.

The model spent the entire turn parsing it.

The tool had returned a list of records as a Python dict serialized to JSON. The model received a tool result that looked like this:

[{"id": "r_001", "name": "OpenAI GPT-5", "category": "model", "status": "active", "last_updated": "2026-04-10"}, {"id": "r_002", "name": "Claude Sonnet 4.6", "category": "model", "status": "active", "last_updated": "2026-04-22"}, ...]

That is valid data. It is also exhausting to read in raw form. The model had to spend tokens picking out field names, tracking which value belonged to which key, and mentally reconstructing the tabular structure before it could do anything useful.

I was paying for an extra round-trip on every search call. Not because the data was wrong. Because the shape was hard to read.

That is the problem tool-output-format solves.

The shape of the fix

Install it:

pip install tool-output-format

Wrap your tool function with the @format_output decorator:

from tool_output_format import format_output

@format_output()
def search_records(query: str) -> list[dict]:
    return [
        {"id": "r_001", "name": "OpenAI GPT-5", "category": "model", "status": "active"},
        {"id": "r_002", "name": "Claude Sonnet 4.6", "category": "model", "status": "active"},
    ]

The model now receives this as the tool result:

| id    | name            | category | status |
|-------|-----------------|----------|--------|
| r_001 | OpenAI GPT-5    | model    | active |
| r_002 | Claude Sonnet 4.6 | model  | active |

Same data. Different shape. The model reads it instead of reconstructing it.

You can also call the formatter directly without a decorator:

from tool_output_format import format_tool_output

raw = your_tool_function(query="models")
formatted = format_tool_output(raw)
return formatted

Or register a custom formatter for a specific tool:

from tool_output_format import register_formatter

def my_formatter(data):
    lines = [f"- **{item['name']}**: {item['status']}" for item in data]
    return "\n".join(lines)

register_formatter("search_records", my_formatter)

After that, any call routed through the library for search_records uses your formatter instead of the default table renderer.

For nested dicts, the library renders structured key-value output:

@format_output()
def get_config() -> dict:
    return {"model": "claude-sonnet-4-6", "temperature": 0.3, "max_tokens": 2048}

Output:

- model: claude-sonnet-4-6
- temperature: 0.3
- max_tokens: 2048

For plain text that already looks like structured output, the library passes it through. For text that looks like raw code or data, it wraps in a fenced code block.

You can set a max output length in characters. When the formatted result exceeds that, the library truncates and appends a note:

@format_output(max_chars=500)
def fetch_document(doc_id: str) -> str:
    ...

For richer truncation strategies (head, tail, middle), pair it with tool-output-truncate-py.

What it does NOT do

It does not modify the tool's return value in your application logic. It formats the string representation that becomes the tool result in the LLM message.
It does not validate tool arguments before execution. For that, use agentvet.
It does not cache the formatted output. For caching, use tool-result-cache.
It does not build Anthropic content blocks. For that, use llm-content-blocks.

Inside the lib: the empty list decision

The most deliberate design choice in this library is what happens when a tool returns an empty list.

A naive implementation would render an empty Markdown table:

| id | name | category |
|----|------|----------|

That is technically valid Markdown. But several models handle it poorly. Some treat an empty table as a signal to ask a clarifying question rather than proceeding. Some output a confused summary about zero results in a confusing format. In practice, an empty table is worse than no table.

The library detects an empty list before rendering and returns a clear message instead:

No results returned.

If you prefer a different empty message, you can set it:

@format_output(empty_message="No records matched your query.")
def search_records(query: str) -> list[dict]:
    ...

Clear text beats an empty table. Every model tested understood "No results returned" immediately and moved on without an extra clarifying turn.

Column order

For list-of-dicts, column order is inferred from the keys of the first item in the list. Python dicts preserve insertion order since 3.7, so the column order in the table matches the order the tool author defined. No sorting, no alphabetization. The columns appear in the order the data was built.

If you want a specific order, control it at the tool level by returning dicts with keys in the intended order. The formatter follows the data.

When this is useful

Any agent that calls tools returning structured data benefits from this pattern. Search tools, database queries, API list endpoints, config readers, file index tools. The formatted result is more readable in traces too, which helps debugging.

It is especially useful when you have a mix of tools with different return shapes. Rather than writing a formatter for each one, you get sensible defaults (table for list-of-dicts, key-value for dict, code block for raw text) that work for most cases without configuration.

It also compounds with caching. When you cache formatted results with tool-result-cache, you are caching the already-formatted string. Subsequent cache hits skip both the tool call and the formatting step.

When NOT to use this

If your tool already returns a human-readable string, adding the formatter adds overhead for no benefit. The library does handle that case (it passes clean text through), but wrapping a tool that already formats its own output is noise.

If your tool returns binary data, images, or deeply nested recursive structures, a custom formatter will do better than the defaults. The built-in renderers assume tabular or shallow structured data.

If you need precise control over Anthropic content blocks (image blocks, document blocks, tool result arrays), use llm-content-blocks directly. This library focuses on text rendering.

Install

pip install tool-output-format

Source: MukundaKatta/tool-output-format

31 tests, zero dependencies.

Siblings

Lib	Boundary	Repo
tool-output-truncate-py	Truncate formatted output with head/tail/middle strategies	MukundaKatta/tool-output-truncate-py
tool-result-cache	Cache the formatted tool result by canonical args	MukundaKatta/tool-result-cache
agentvet	Validate tool args before the tool runs	MukundaKatta/agentvet
llm-content-blocks	Build typed Anthropic content blocks	MukundaKatta/llm-content-blocks

What is next

A few things on the list for the next version:

HTML table output mode for tools that feed web UIs.
A max_rows parameter for list-of-dicts that truncates rows and appends a count note, separate from max_chars.
Support for nested list-of-dicts (one level of nesting rendered as grouped sections rather than flattened).

The core pattern, render first then return, is stable. Tool output that is readable for a human is also more reliably readable for a model. This library makes that the default with one decorator.

DEV Community