If you’ve ever hit a rate limit in your AI assistant or felt the sting of regret after checking your usage bill, you’re not alone. Whether you’re exploring an open source repo or triaging issues for a sprint, running into token walls is disruptive. It breaks your flow and burns your time and money.
Turns out, there’s a hidden cost in many of today’s AI-enhanced dev workflows: tool metadata bloat. When dozens (or hundreds) of tools get injected into each prompt, it drives up token usage and slows down responses. Input tokens aren’t free, and cluttering the context window with irrelevant content degrades model performance.
At Stacklok, we’ve been working with the Model Context Protocol (MCP) and discovered something surprising. A significant chunk of the tokens burned during AI coding sessions doesn’t come from your prompt, or even the code. It comes from tool descriptions.
MCP Optimizer, now available in ToolHive, tackles this problem at the root. It reduces token waste by acting as a smart broker between your AI assistant and MCP servers.
Where the waste comes from
Let’s say you’ve installed MCP servers for GitHub, Grafana, and Notion. You ask your assistant:
“List the 10 most recent issues from my GitHub repo.”
That simple prompt uses 102,000 tokens (total input & output), not because the task is complex, but because the model receives metadata for 114 tools, most of which have nothing to do with the request.
Other common prompts create similar waste:
“Summarize my meeting notes from October 19, 2025”
uses 240,600 tokens, again with 114 tools injected, even though only the Notion server is relevant“Search dashboards related to RDS”
consumes 93,600 tokens
In each case, only a small fraction of those tokens are relevant to the task. Even saying “hello” burns more than 46,000 tokens.
Multiply that across even a few dozen prompts per day, and you’re burning millions of tokens on context the model doesn’t need. That’s not just expensive, it’s disruptive. In rate-limited enterprise environments or time-sensitive projects, this inefficiency slows down responses, breaks flow, and cuts directly into productivity.
Introducing MCP Optimizer: Smarter tool selection for leaner prompts
Instead of flooding the model with all available tools, MCP Optimizer introduces two lightweight primitives:
-
find_tool: Searches for the most relevant tools using hybrid semantic + keyword search -
call_tool: Routes the selected tool request to the appropriate MCP server
Here’s how it works:
- You send a prompt that requires tool assistance (for example, interacting with a GitHub repo)
- The assistant calls
find_tool - MCP Optimizer returns the most relevant tools (up to 8 by default, but this is configurable)
- Only those tools are included in the context
- The assistant uses
call_toolto execute the task
The results are dramatic. Using the GitHub, Grafana, and Notion MCP servers from the example above:
| Prompt | MCP server used | Without MCP Optimizer | With MCP Optimizer | Token reduction |
|---|---|---|---|---|
| Hello | None | Tokens*: 46.8k Tools sent: 114 | Tokens: 11.2k Tools sent: 3 | 76% |
| List the latest 10 issues from the stacklok/toolhive repository. | GitHub | Tokens: 102k Tools sent: 114 | Tokens: 32.4k Tools sent: 11 | 68% |
| Summarize my meeting notes from Oct 19th 2025 | Notion | Tokens: 240.6k Tools sent: 114 | Tokens: 86.8k Tools sent: 11 | 64% |
| Search the dashboards related to "RDS" in my Grafana workspace | Grafana | Tokens: 93.6k Tools sent: 114 | Tokens: 13.7k Tools sent: 11 | 85% |
* Total input & output tokens for the request
By sending only what’s needed, MCP Optimizer reduces total token usage, shortens response times, and prevents the assistant from thrashing through irrelevant tools.
No tokens wasted on excessive metadata. No LLMs spiraling as they try to reason through 100+ tools. Just fast, efficient execution.
Try it now
MCP Optimizer is available today as an experimental feature in the ToolHive desktop app. Here’s how to get started:
- Download ToolHive for your platform.
- Follow the Quickstart guide and MCP usage guides to install a few MCP servers into the
defaultgroup (or another group of your choice). - In the Settings (⚙️) screen, enable MCP Optimizer under Experimental Features.
- On the MCP Servers screen, click MCP Optimizer, and enable optimization for the
defaultgroup. - Open the
defaultgroup and click Manage Clients to connect your favorite AI client. - The optimizer discovers the MCP servers and tools in the default group, and ToolHive automatically connects your clients to the optimizer MCP server.
- In your AI client, send prompts that require tool usage, like: “Find a good first issue in the stacklok/toolhive repo to start working on.”
For more, see the full tutorial in the ToolHive documentation.
What’s next
We’re building ToolHive and MCP Optimizer in the open, and your feedback helps shape what comes next.
Explore the project at toolhive.dev and join our community on Discord to share your experiences, suggest features, and help make tool-driven AI workflows faster, safer, and more developer-friendly.


Top comments (0)