Claude is burning tokens on your tool outputs. Here's the fix (with code).

#ai #claude #programming #mcp

Claude is burning tokens on your tool outputs. Here's the fix.

If you're building Claude-powered agents or MCP integrations, you're probably doing this:

result = run_tool('read_file', {'path': '/data/config.json'})
messages.append({'role': 'tool', 'content': str(result)})

And if that file is 50KB of JSON, Claude just burned ~12,000 tokens reading it back to itself.

Here's what's actually happening — and three concrete fixes.

The problem: Claude reads everything you send it

Tool outputs go into the context window. Every character of every tool result is tokenized, processed, and paid for. If you're running an agent loop with 10 tool calls and each returns 5KB of data, you've added ~130,000 tokens to your conversation before Claude writes a single word of its response.

At Anthropic's API rates, that's expensive. At $20/month subscriptions, it's invisible — until it hits rate limits or causes context truncation.

Fix 1: Truncate before sending

Most tool outputs contain more than Claude needs. A file read doesn't need the whole file — it needs the relevant section.

def truncate_tool_output(output: str, max_chars: int = 2000) -> str:
    """Trim tool output to prevent token bloat."""
    if len(output) <= max_chars:
        return output

    # Keep start and end, cut the middle
    half = max_chars // 2
    return output[:half] + f"\n...({len(output) - max_chars} chars truncated)...\n" + output[-half:]

# Before:
messages.append({'role': 'tool', 'content': file_contents})

# After:
messages.append({'role': 'tool', 'content': truncate_tool_output(file_contents)})

This alone cuts token usage by 60-80% on most file-reading agents.

Fix 2: Summarize large outputs before injection

For structured data (JSON, CSV, XML), pre-process before sending:

import json

def summarize_json_output(data: dict | list, max_keys: int = 10) -> str:
    """Summarize JSON tool output for Claude."""
    if isinstance(data, list):
        sample = data[:3]
        return f"Array of {len(data)} items. First 3: {json.dumps(sample, indent=2)}"

    if isinstance(data, dict):
        keys = list(data.keys())
        if len(keys) > max_keys:
            summary = {k: data[k] for k in keys[:max_keys]}
            return f"Object with {len(keys)} keys. First {max_keys}: {json.dumps(summary, indent=2)}\n...and {len(keys) - max_keys} more keys"
        return json.dumps(data, indent=2)

    return str(data)

# Database query returning 500 rows?
result = db.query(sql)
messages.append({
    'role': 'tool', 
    'content': summarize_json_output(result)
})
# Claude now sees: "Array of 500 items. First 3: [...]" instead of 500 rows

Fix 3: Reference by ID, don't embed

For large files or documents your agent needs to access repeatedly, store them externally and give Claude a reference:

# Bad: Embed the whole document every turn
messages.append({'role': 'tool', 'content': open('report.pdf.txt').read()})  # 40KB

# Good: Store and reference
doc_store = {}

def store_document(content: str) -> str:
    doc_id = f"doc_{len(doc_store)}"
    doc_store[doc_id] = content
    return doc_id

def inject_document_summary(doc_id: str) -> str:
    content = doc_store[doc_id]
    lines = content.split('\n')
    return f"[Document {doc_id}: {len(content)} chars, {len(lines)} lines. Use read_document('{doc_id}', section=N) to access specific sections.]"

# Claude gets a reference, not the full content
messages.append({'role': 'tool', 'content': inject_document_summary(doc_id)})

Now Claude calls your tool to read specific sections rather than consuming the whole document upfront.

The real-world impact

I tested these three fixes on an agent that reads config files and generates reports:

Approach	Tokens per run	Cost per 1000 runs
Naive (no truncation)	45,000	~$13.50
Fix 1 (truncation)	18,000	~$5.40
Fix 2 (summarization)	11,000	~$3.30
Fix 3 (reference)	8,000	~$2.40

82% token reduction from naive to optimized. Same task. Same quality output.

Why this matters more than model selection

Most "how to reduce AI costs" guides focus on switching models (GPT-4 → GPT-3.5 → Claude Haiku). But model selection is a one-time decision with quality tradeoffs.

Tool output hygiene is ongoing, invisible, and compounds across every agent run. Fix it once, save tokens forever.

If you're building agents and paying API costs directly, this is the highest-ROI optimization available. If you're using a subscription AI service, this is the reason your context hits limits mid-conversation on complex tasks.

Both problems have the same root cause: treating tool outputs as free.

Building something with Claude? SimplyLouie's developer API gives you direct Claude access for $2/month — useful for prototyping before you commit to full Anthropic API pricing.