Render Tool Output as LLM-Friendly Markdown: Give the Model Readable Data, Not Raw JSON

#hermeschallenge #ai #python #agents

Your search tool returns a list of results. You give the model this:

[{"id":"r1","title":"Q3 Report","score":0.92,"url":"...","excerpt":"Revenue grew 18%..."},{"id":"r2","title":"Q2 Report","score":0.81,"url":"...","excerpt":"Revenue grew 12%..."}]

The model reads it. It works. But it also tokenizes every bracket, quote, and comma as separate tokens. You are spending context window tokens on JSON syntax that carries no semantic value for the model.

tool-output-format renders tool outputs as clean, token-efficient markdown that the model reads better.

The Shape of the Fix

from tool_output_format import ToolOutputFormat

fmt = ToolOutputFormat()

raw_results = [
    {"title": "Q3 Report", "score": 0.92, "excerpt": "Revenue grew 18%", "url": "..."},
    {"title": "Q2 Report", "score": 0.81, "excerpt": "Revenue grew 12%", "url": "..."},
]

formatted = fmt.format_list(
    items=raw_results,
    fields=["title", "score", "excerpt"],
    title="Search Results",
    item_label="Result",
)

Output:

## Search Results (2 results)

**Result 1: Q3 Report**
- Score: 0.92
- Excerpt: Revenue grew 18%

**Result 2: Q2 Report**
- Score: 0.81
- Excerpt: Revenue grew 12%

The model reads this and understands it as structured data. The JSON structure is preserved semantically (heading, label, field values) but represented in markdown, which uses fewer tokens for the same information.

What It Does NOT Do

tool-output-format does not change the information content. The same data is present in both formats. It changes the representation from JSON to markdown. If your downstream code expects JSON tool results (for logging, for replay), keep the raw JSON for logging and only format for the LLM message.

It does not know which fields are important. You specify fields=["title", "score", "excerpt"] to control which fields appear. Fields not in the list are excluded from the formatted output. If you do not specify fields, all fields are included.

It does not handle deeply nested objects. It renders flat dicts and lists of flat dicts. If your tool returns deeply nested JSON, flatten it first or use format_json() which renders the full JSON as a fenced code block (less efficient than markdown but handles arbitrary depth).

Inside the Library

The formatters handle common output shapes:

class ToolOutputFormat:
    def format_list(
        self,
        items: list[dict],
        fields: list[str] | None = None,
        title: str | None = None,
        item_label: str = "Item",
    ) -> str:
        lines = []

        if title:
            lines.append(f"## {title} ({len(items)} {'result' if len(items)==1 else 'results'})\n")

        for i, item in enumerate(items, 1):
            display_fields = fields or list(item.keys())

            if display_fields:
                first_field = display_fields[0]
                label_text = item.get(first_field, f"{item_label} {i}")
            else:
                label_text = f"{item_label} {i}"

            lines.append(f"**{item_label} {i}: {label_text}**")

            for field in display_fields[1:]:
                if field in item:
                    value = item[field]
                    if isinstance(value, float):
                        lines.append(f"- {field.replace('_', ' ').title()}: {value:.3f}")
                    else:
                        lines.append(f"- {field.replace('_', ' ').title()}: {value}")

            lines.append("")  # blank line between items

        return "\n".join(lines).strip()

    def format_dict(self, data: dict, title: str | None = None) -> str:
        lines = []
        if title:
            lines.append(f"## {title}\n")
        for key, value in data.items():
            display_key = key.replace('_', ' ').title()
            lines.append(f"- **{display_key}**: {value}")
        return "\n".join(lines)

    def format_table(self, rows: list[dict], columns: list[str]) -> str:
        header = " | ".join(c.replace('_', ' ').title() for c in columns)
        separator = " | ".join("---" for _ in columns)
        data_rows = [
            " | ".join(str(row.get(c, "")) for c in columns)
            for row in rows
        ]
        return "\n".join([header, separator] + data_rows)

    def format_error(self, error: str, tool: str | None = None) -> str:
        prefix = f"Error in {tool}: " if tool else "Error: "
        return f"**{prefix}**{error}"

    def format_success(self, message: str) -> str:
        return f"Success: {message}"

The table format is most token-efficient for tabular data:

Title | Score | Excerpt
--- | --- | ---
Q3 Report | 0.92 | Revenue grew 18%
Q2 Report | 0.81 | Revenue grew 12%

When to Use It

Use it for tools that return lists of results: search results, database query results, document lists, API response arrays. These are the most common tool output shapes and the ones where markdown provides the biggest token savings.

Use it when your agent processes many search results per run. If you are doing RAG with 5 chunks of 1,000 tokens each, formatting those chunks as markdown can save 15-20% of the context window tokens used for tool results.

Use it for error messages. format_error() produces a clear, clean error message that the model understands better than a raw exception dict with traceback and stack trace.

Skip it for tools that return data that must remain structured for downstream processing. If your agent's tool results are also used by non-LLM code that expects JSON, maintain the JSON for that path and only format for the LLM message content.

Install

pip install git+https://github.com/MukundaKatta/tool-output-format

# Or from PyPI
pip install tool-output-format

from tool_output_format import ToolOutputFormat

fmt = ToolOutputFormat()

def execute_and_format_tool(name: str, args: dict) -> str:
    try:
        raw_result = tools[name](**args)

        if isinstance(raw_result, list) and raw_result:
            if isinstance(raw_result[0], dict):
                # List of records -> formatted list
                return fmt.format_list(raw_result, title=f"{name} Results")
            else:
                # List of strings -> bullet list
                return "- " + "\n- ".join(str(item) for item in raw_result)

        elif isinstance(raw_result, dict):
            return fmt.format_dict(raw_result, title=name.replace('_', ' ').title())

        else:
            return str(raw_result)

    except Exception as e:
        return fmt.format_error(str(e), tool=name)

# In agent loop: tool result goes to model as formatted markdown
tool_result_message = {
    "type": "tool_result",
    "tool_use_id": block.id,
    "content": execute_and_format_tool(block.name, block.input),
}

Sibling Libraries

Library	What it solves
`llm-content-blocks`	Build typed content blocks for Anthropic messages
`tool-output-truncate`	Truncate oversized tool outputs before formatting
`agent-step-log`	Log both raw and formatted versions
`llm-prompt-compress`	Further reduce formatted output if still too large
`agent-message-sanitize`	Validate the formatted message structure

The output pipeline: tool-output-truncate for size limiting, tool-output-format for markdown rendering, llm-content-blocks for assembling the message structure.

What's Next

Auto-format detection: fmt.auto_format(result) that detects the shape of the result (list of dicts, single dict, list of strings, primitive) and chooses the appropriate formatter automatically. Reduces boilerplate in the tool dispatch layer.

Token budget mode: fmt.format_list(items, max_tokens=500) that automatically truncates the formatted output to fit within a token budget. This is more intelligent than tool-output-truncate because it can drop lower-priority fields rather than truncating at a hard character count.

Custom templates: fmt.register_template("search_results", template_string) for domain-specific output shapes. Some domains (code search, medical records, financial data) have specific formatting conventions that a generic markdown renderer does not capture.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.