DEV Community

gentic news
gentic news

Posted on • Originally published at gentic.news

How Claude Code's Tool Search Saves 90% of Your Context Window

Tool search automatically defers MCP tool definitions, replacing them with a single search tool that loads tools on-demand, preserving your context window for actual work.

How Claude Code's Tool Search Saves 90% of Your Context Window

What Tool Search Actually Does

Every MCP tool you connect to Claude Code comes with a definition — name, description, and JSON schema — that costs 200-800 tokens. With multiple MCP servers (GitHub, Slack, Jira, etc.), you can easily burn 60,000+ tokens on tool definitions alone, every single turn, before Claude even reads your message.

Tool search solves this by deferring most tool definitions. Instead of sending all 147 tool schemas, Claude Code sends:

  • ~25 built-in tool definitions
  • A single ToolSearch tool
  • A prompt telling Claude: "147 deferred tools available — use ToolSearch to load them"

This reduces token overhead from ~90,000 to ~15,000 tokens immediately.

How It Works In Practice

When Claude needs a specific tool, it calls:

ToolSearch({"query": "github create issue"})
Enter fullscreen mode Exit fullscreen mode

The system returns a tool_reference for mcp__github__create_issue. On the next turn, that tool's full schema is available, and Claude can call it normally.

The cost? One extra turn and ~200 tokens for discovery.
The savings? ~1.5 million tokens over a 20-turn conversation.

Which Tools Get Deferred?

The system uses a priority-ordered checklist:

  1. Explicit opt-out first: MCP tools can declare _meta['anthropic/alwaysLoad'] to force loading every turn
  2. MCP tools deferred by default: Most MCP tools are workflow-specific and numerous
  3. ToolSearch never deferred: It's the bootstrap mechanism
  4. Core communication tools never deferred: Agent, Brief — Claude needs these immediately
  5. Built-in tools with shouldDefer flag: Rarely used but available

This follows Anthropic's principle of "fail closed, fail toward asking." If anything is uncertain, the system loads all tools rather than hiding them.

Three Modes You Can Configure

Tool search operates in three modes controlled by ENABLE_TOOL_SEARCH:

Mode 1: tst (Default)

Always defer MCP and shouldDefer tools. This is the right default — if you're using MCP tools, you've already accepted the latency tradeoff for a larger effective context window.

Mode 2: tst-auto

Threshold-based deferral. Only defer when tools exceed a token budget. Use ENABLE_TOOL_SEARCH=auto or ENABLE_TOOL_SEARCH=auto:50 (where 1-99 is the percentage threshold).

Mode 3: standard

Never defer. Use ENABLE_TOOL_SEARCH=false to disable completely.

The Snapshot Mechanism

Discovered tools are preserved across context compaction through a snapshot system. When compaction occurs, the system takes a snapshot of:

  • All currently loaded tools
  • The ToolSearch tool
  • The deferral state

This ensures Claude doesn't lose access to tools it's already discovered, even as the conversation context gets trimmed.

What This Means For Your MCP Servers

If you're building MCP servers, consider:

  1. Mark critical tools with alwaysLoad: If a tool is needed on nearly every turn (like a primary database query), opt it out of deferral
  2. Write clear tool descriptions: Tool search uses semantic matching, so good descriptions improve discovery accuracy
  3. Group related tools: Tools with similar prefixes or descriptions will be discovered together

Try It Now

Check your current configuration:

echo $ENABLE_TOOL_SEARCH
Enter fullscreen mode Exit fullscreen mode

If you're not seeing tool search benefits, ensure:

  1. You have MCP tools configured
  2. You're not in standard mode
  3. Your MCP servers aren't all marked alwaysLoad

For maximum savings with minimal latency impact, use the default tst mode. The one-turn discovery overhead is negligible compared to the context window preservation.

When To Use Each Mode

  • Default (tst): Most users — balances savings with discovery latency
  • tst-auto:20: If you're sensitive to latency but still want some savings
  • standard: Only if you have very few MCP tools or need every tool available immediately

This system represents a fundamental shift in how Claude Code handles tool ecosystems. Instead of paying the token cost for every possible tool upfront, you now pay only for what you actually use.


Originally published on gentic.news

Top comments (0)