DEV Community

Dan Barr for Stacklok

Posted on

Cut token waste from your AI workflow with the ToolHive MCP Optimizer

If you’ve ever hit a rate limit in your AI assistant or felt the sting of regret after checking your usage bill, you’re not alone. Whether you’re exploring an open source repo or triaging issues for a sprint, running into token walls is disruptive. It breaks your flow and burns your time and money.

Turns out, there’s a hidden cost in many of today’s AI-enhanced dev workflows: tool metadata bloat. When dozens (or hundreds) of tools get injected into each prompt, it drives up token usage and slows down responses. Input tokens aren’t free, and cluttering the context window with irrelevant content degrades model performance.

At Stacklok, we’ve been working with the Model Context Protocol (MCP) and discovered something surprising. A significant chunk of the tokens burned during AI coding sessions doesn’t come from your prompt, or even the code. It comes from tool descriptions.

MCP Optimizer, now available in ToolHive, tackles this problem at the root. It reduces token waste by acting as a smart broker between your AI assistant and MCP servers.

Where the waste comes from

Let’s say you’ve installed MCP servers for GitHub, Grafana, and Notion. You ask your assistant:

“List the 10 most recent issues from my GitHub repo.”

That simple prompt uses 102,000 tokens (total input & output), not because the task is complex, but because the model receives metadata for 114 tools, most of which have nothing to do with the request.

Other common prompts create similar waste:

  • “Summarize my meeting notes from October 19, 2025”
    uses 240,600 tokens, again with 114 tools injected, even though only the Notion server is relevant

  • “Search dashboards related to RDS”
    consumes 93,600 tokens

In each case, only a small fraction of those tokens are relevant to the task. Even saying “hello” burns more than 46,000 tokens.

Multiply that across even a few dozen prompts per day, and you’re burning millions of tokens on context the model doesn’t need. That’s not just expensive, it’s disruptive. In rate-limited enterprise environments or time-sensitive projects, this inefficiency slows down responses, breaks flow, and cuts directly into productivity.

Introducing MCP Optimizer: Smarter tool selection for leaner prompts

Instead of flooding the model with all available tools, MCP Optimizer introduces two lightweight primitives:

  1. find_tool: Searches for the most relevant tools using hybrid semantic + keyword search
  2. call_tool: Routes the selected tool request to the appropriate MCP server

Here’s how it works:

  1. You send a prompt that requires tool assistance (for example, interacting with a GitHub repo)
  2. The assistant calls find_tool
  3. MCP Optimizer returns the most relevant tools (up to 8 by default, but this is configurable)
  4. Only those tools are included in the context
  5. The assistant uses call_tool to execute the task

The results are dramatic. Using the GitHub, Grafana, and Notion MCP servers from the example above:

Prompt MCP server used Without MCP Optimizer With MCP Optimizer Token reduction
Hello None Tokens*: 46.8k Tools sent: 114 Tokens: 11.2k Tools sent: 3 76%
List the latest 10 issues from the stacklok/toolhive repository. GitHub Tokens: 102k Tools sent: 114 Tokens: 32.4k Tools sent: 11 68%
Summarize my meeting notes from Oct 19th 2025 Notion Tokens: 240.6k Tools sent: 114 Tokens: 86.8k Tools sent: 11 64%
Search the dashboards related to "RDS" in my Grafana workspace Grafana Tokens: 93.6k Tools sent: 114 Tokens: 13.7k Tools sent: 11 85%

* Total input & output tokens for the request

By sending only what’s needed, MCP Optimizer reduces total token usage, shortens response times, and prevents the assistant from thrashing through irrelevant tools.

Bar chart comparing token usage before and after the MCP Optimizer

No tokens wasted on excessive metadata. No LLMs spiraling as they try to reason through 100+ tools. Just fast, efficient execution.

Try it now

MCP Optimizer is available today as an experimental feature in the ToolHive desktop app. Here’s how to get started:

  1. Download ToolHive for your platform.
  2. Follow the Quickstart guide and MCP usage guides to install a few MCP servers into the default group (or another group of your choice).
  3. In the Settings (⚙️) screen, enable MCP Optimizer under Experimental Features.
  4. On the MCP Servers screen, click MCP Optimizer, and enable optimization for the default group.
  5. Open the default group and click Manage Clients to connect your favorite AI client.
  6. The optimizer discovers the MCP servers and tools in the default group, and ToolHive automatically connects your clients to the optimizer MCP server.
  7. In your AI client, send prompts that require tool usage, like: “Find a good first issue in the stacklok/toolhive repo to start working on.”

For more, see the full tutorial in the ToolHive documentation.

What’s next

We’re building ToolHive and MCP Optimizer in the open, and your feedback helps shape what comes next.

Explore the project at toolhive.dev and join our community on Discord to share your experiences, suggest features, and help make tool-driven AI workflows faster, safer, and more developer-friendly.

Top comments (0)