vdalhambra

Posted on Apr 15

Context budget optimization: how to design MCP tools that don't waste tokens

#mcp #python #ai #claude

Most MCP tools have a context budget problem nobody talks about.

You call a tool. Claude gets back 4,000 tokens of raw JSON. It processes them, summarizes them, and the user gets a one-line answer. You just burned the equivalent of 3 pages of reading for a sentence.

I built this mistake into my first version of FinanceKit. Then I fixed it. Here's what I learned.

The problem: raw data ≠ useful context

A typical bad MCP tool response looks like this:

{
  "symbol": "AAPL",
  "regularMarketPrice": 213.49,
  "regularMarketChange": -1.23,
  "regularMarketChangePercent": -0.57,
  "regularMarketVolume": 48293847,
  "regularMarketOpen": 214.20,
  "regularMarketDayHigh": 214.85,
  "regularMarketDayLow": 212.90,
  "fiftyTwoWeekHigh": 237.49,
  "fiftyTwoWeekLow": 164.08,
  "marketCap": 3214938472000,
  "trailingPE": 34.2,
  "forwardPE": 28.7,
  "dividendYield": 0.0051,
  "beta": 1.24,
  "averageVolume": 52847393,
  "averageVolume10days": 49827364,
  "currency": "USD",
  "exchange": "NMS",
  "quoteType": "EQUITY",
  ...
}

That's 500+ tokens. Claude has to read all of it, figure out what matters, and synthesize an answer. For every tool call.

Now imagine a user asks "should I buy Apple?". Claude calls stock_quote, gets this blob, then calls technical_analysis, gets another blob, then risk_metrics, gets another blob. You've burned 2,000+ tokens on raw data before Claude writes a single word of analysis.

The fix: structured verdicts, not raw data

The key insight that changed how I think about MCP tool design:

Claude is a reasoning engine, not a data parser. Give it conclusions to reason about, not numbers to parse.

Here's the same data after the fix:

{
  "symbol": "AAPL",
  "price": 213.49,
  "change_pct": -0.57,
  "verdict": "NEUTRAL",
  "signals": {
    "trend": "DOWNTREND — below 20-day MA for 3 sessions",
    "momentum": "WEAKENING — RSI at 42, approaching oversold",
    "volume": "NORMAL — 8% below 10-day average",
    "risk": "MODERATE — beta 1.24, 52-week range position: 37%"
  },
  "one_line": "AAPL in mild downtrend, watching 212 support. Not a buy today.",
  "raw": { ... }  // full data available if Claude needs it
}

Same information. 60% fewer tokens. Claude gets a verdict and signals it can reason from directly. The one_line field is the synthesis the LLM was going to produce anyway — might as well compute it server-side.

The 3 patterns I use in FinanceKit and SiteAudit

Pattern 1: Verdict fields

Every tool that returns analysis gets a verdict or status field:

# In technical_analysis tool
verdicts = {
    (rsi > 70, macd_positive): "OVERBOUGHT — consider taking profits",
    (rsi < 30, not macd_positive): "OVERSOLD — potential entry point",
    (above_20ma and above_50ma): "UPTREND — momentum confirmed",
    (below_20ma and below_50ma): "DOWNTREND — avoid new longs",
}

Claude doesn't compute RSI from numbers. It reads "OVERSOLD — potential entry point" and reasons about it. Token cost for this field: ~8 tokens. Token cost for Claude to derive it from raw data: ~200 tokens.

Pattern 2: Tiered detail

Return summaries by default. Offer detail on request.

@mcp.tool()
def seo_audit(url: str, detail: Literal["summary", "full"] = "summary") -> dict:
    """Run SEO audit. Use detail='summary' for quick overview."""
    results = run_full_audit(url)

    if detail == "summary":
        return {
            "score": results["score"],
            "grade": results["grade"],  # "B+" or "D-"
            "top_issues": results["issues"][:3],  # Only top 3
            "quick_win": results["issues"][0]["fix"],  # One actionable fix
        }
    return results  # full 3,000-token response only when asked

In SiteAudit, full_audit with detail="summary" returns ~300 tokens. With detail="full" it returns ~2,500 tokens. Claude asks for full detail only when the user explicitly wants deep analysis.

Pattern 3: Pre-computed comparisons

If the user is going to compare things, do the comparison server-side:

# Instead of:
# tool_response_asset_1: {500 tokens}
# tool_response_asset_2: {500 tokens}
# Claude computes: who wins?

# Do this:
@mcp.tool()
def compare_assets(symbols: list[str]) -> dict:
    """Compare multiple assets. Returns ranked list with reasoning."""
    assets = [analyze(s) for s in symbols]
    return {
        "ranked": sorted(assets, key=lambda x: x["score"], reverse=True),
        "winner": assets[0]["symbol"],
        "reasoning": f"{assets[0]['symbol']} leads on momentum (+RSI) and lower drawdown",
        "matrix": build_comparison_matrix(assets)  # structured, not prose
    }

One tool call. One response. Claude doesn't re-derive the comparison.

How to measure your context efficiency

Quick mental model: for each tool call, estimate the token ratio:

efficiency = information_density / token_count

# Bad:  1 useful insight / 800 tokens = 0.00125
# Good: 1 useful insight / 80 tokens  = 0.0125

You want Claude spending its context budget on reasoning, not reading raw API responses.

A few things I check now when designing a new tool:

Can Claude derive the key insight from this response without reading the whole thing? If yes, add a verdict field. If it always needs everything, rethink the tool scope.
Does this tool return data the LLM will almost never need? Add a ?include_raw=false default. Let the LLM request it when it actually matters.
Is this tool a sub-step of a workflow? Consider merging it. Three 300-token tool calls vs. one 500-token call that returns the same composite insight.

The meta-lesson

MCP servers aren't just API wrappers with a protocol layer. The best ones pre-process data for reasoning, not just for display.

When I rebuilt FinanceKit's technical_analysis tool with verdict fields, the average Claude response quality went up noticeably — less hedging, more specific recommendations, faster answers. Not because the data changed. Because Claude was spending its context on analysis instead of parsing.

Context budget is the new API rate limit. Design for it.

If you want to see these patterns in a production MCP server:

FinanceKit (17 financial analysis tools): https://mcpize.com/mcp/financekit-mcp/playground?ref=MSGX — free tier, no install needed
SiteAudit (11 web audit tools): https://mcpize.com/mcp/siteaudit-mcp/playground?ref=MSGX — free tier, run a real audit in 10 seconds

Or browse the source: FinanceKit · SiteAudit

DEV Community