Token usage dropped. Accuracy improved. And I built a 200-line Python proxy to prove it.
The Problem Nobody Talks About
MCP (Model Context Protocol) was supposed to be the universal remote for AI agents. Connect once, and your agent can interact with GitHub, Jira, Slack, filesystems, databases—you name it.
But here's what nobody tells you: connect four MCP servers, and your agent burns 60,000 tokens before you even say "hello."
Redis ran the numbers. A typical setup with Redis, GitHub, Jira, and Grafana—four servers, 167 tools—consumes ~60,000 tokens upfront just loading tool descriptions. In production, it's often 150,000+ tokens.
Atlassian found their own MCP server alone consumes ~10,000 tokens for Jira and Confluence. GitHub's official server exposes 94 tools and chews through ~17,600 tokens per request. Combine several, and you hit 30,000+ tokens of pure metadata—before your agent solves anything.
Every extra tool is a chance to pick the wrong one. Redis measured 42% tool selection accuracy without filtering. The model gets lost in the noise, grabs the wrong tool, overwrites data, or sends requests into the void.
We gave agents unlimited power. And they became slower, dumber, and more expensive.
The Solutions (and Why They're Not Enough)
The industry noticed. Multiple solutions emerged:
| Approach | Example | Core Problem |
|---|---|---|
| Regex-based filtering |
mcpwrapped, Tool Filter MCP
|
You must manually configure which tools to hide. 167 tools? Good luck. |
| Schema compression | Atlassian mcp-compressor (97% reduction) |
Strips descriptions to save tokens, but accuracy drops—models can't tell create_jira_issue from create_confluence_page. |
| Tool Search (Anthropic) | Claude Code built-in | 85% token reduction, but only 34% selection accuracy in independent testing. |
| Vector search (Redis) | Redis Tool Filtering | 98% token reduction, 8x faster, 2x accuracy—but requires Redis infrastructure. |
| Hybrid search (Stacklok) | MCP Optimizer | 94% accuracy on 2,792 tools, but closed-source commercial product. |
All of them fall into one of two traps:
- Manual configuration: You have to know in advance which tools to hide.
- Heavy infrastructure: You need Redis, a cloud service, or a commercial license.
What I wanted was simple: zero-config, 100% local, and smart enough to figure out what tools I actually need.
So I built it.
Introducing shutup-mcp
shutup is an MCP proxy that shows your agent only the tools it actually needs—zero config, 100% local, no API keys.
shutup --config ~/claude_desktop_config.json --intent "read and write files"
That's it. Behind the scenes, shutup:
- Reads your MCP config and discovers all connected servers—filesystem, GitHub, Jira, whatever.
-
Fetches all tool definitions and builds a local embedding index using
all-MiniLM-L6-v2(~80MB, runs entirely offline). -
Watches for changes—add a new MCP server,
shutuprebuilds the index automatically. -
Filters tools by intent—when your agent requests tools,
shutupintercepts and returns only the top-K most relevant ones.
Your agent never knows the other 79,997 tools exist.
Why This Approach Wins
1. Zero Config, Actually
No regex. No YAML. No manual whitelists. You already have a claude_desktop_config.json. shutup reads it directly.
{
"mcpServers": {
"filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"] },
"github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"] },
"fetch": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-fetch"] }
}
}
shutup connects to all three, aggregates their tools, and filters them intelligently. No extra configuration files needed.
2. Intent-Based Filtering
Most proxies hide tools based on names or regex patterns. shutup hides tools based on what you're actually trying to do.
Say "read and write files"—shutup returns filesystem tools, hiding GitHub and fetch tools.
Say "create a GitHub issue"—shutup surfaces GitHub tools while hiding filesystem operations.
It treats tool selection as a retrieval problem, not a reasoning one—the same insight that drove Redis to 98% token reduction.
3. Multi-Server Aggregation
This is where shutup differs from most open-source alternatives. It doesn't just filter one MCP server—it aggregates all of them.
When Stacklok analyzed 2,792 tools, they found 94% selection accuracy using hybrid search. But their Optimizer is a commercial product. shutup brings the same pattern—semantic retrieval across multiple servers—to an open-source, zero-dependency tool.
4. Privacy-First, 100% Local
Two embedding backends:
-
sentence-transformers(default): Downloadsall-MiniLM-L6-v2once (~80MB), runs entirely offline. -
ollama: Usenomic-embed-textor any Ollama embedding model. Completely air-gapped.
No API keys. No telemetry. No cloud dependencies.
Benchmark Context (Why This Matters)
Let's put numbers to the problem.
| Scenario | Tools Loaded | Token Overhead (Est.) | Selection Accuracy |
|---|---|---|---|
| Single MCP server (GitHub) | 94 | ~17,600 | 79-88% (Opus 4.5) |
| Four servers (Redis+GitHub+Jira+Grafana) | 167 | ~60,000 | ~42% (without filtering) |
| Enterprise setup (10+ servers) | 500+ | 150,000+ | < 30% |
Sources: Atlassian, Redis, Stacklok, Anthropic
Now look at what filtering achieves:
| Solution | Token Reduction | Selection Accuracy | Infrastructure Required |
|---|---|---|---|
| Anthropic Tool Search | 85% | 34% (2,792 tools) | Built into Claude |
| Atlassian mcp-compressor | 70-97% | Drops at high compression | Proxy only |
| Redis Tool Filtering | 98% | 85% | Redis + vector DB |
| Stacklok MCP Optimizer | 60-85% | 94% | Commercial platform |
| shutup-mcp | ~98% (projected) | TBD (benchmarking) | Zero |
shutup uses the same architectural pattern as Redis (vector embeddings + semantic search) but without the Redis dependency. It's the "Redis approach" in a single pip install.
How It Works (Under the Hood)
Architecture
Agent (Claude Code / Cursor / Windsurf)
↓
shutup-mcp (stdio proxy)
↓
┌─────────────────────────┐
│ ServerManager │
│ - Parses mcp.json │
│ - Manages connections │
│ - Watches for changes │
└─────────────────────────┘
↓
┌─────────────────────────┐
│ ToolEmbedder │
│ - Builds local index │
│ - Cosine similarity │
│ - Returns top-K tools │
└─────────────────────────┘
↓
Upstream MCP Servers (filesystem, github, fetch, …)
Core Loop
-
Startup: Parse
claude_desktop_config.json, connect to each MCP server, fetch tool definitions. -
Embed: For each tool, create text
"{name}: {description}"and embed using chosen backend. -
Request: User provides intent (e.g.,
--intent "read and write files"). - Filter: Compute cosine similarity, return top-K tools (default K=5).
-
Proxy: Forward
tools/listandtools/callrequests transparently.
Example
$ shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
--intent "create a GitHub issue about the API outage" \
--top-k 3
[shutup] Loading config: claude_desktop_config.json
[shutup] Connected to 3 MCP servers (filesystem, github, fetch)
[shutup] Fetched 47 total tools
[shutup] Intent: "create a GitHub issue about the API outage"
[shutup] Returning 3/47 tools:
- github__create_issue
- github__list_issues
- github__get_repo
The agent only sees 3 tools. Token overhead drops from ~25,000 to ~300.
Getting Started
Install
pip install shutup-mcp
Run
# Default: sentence-transformers (auto-downloads model)
shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
--intent "your task description"
# Privacy mode: use Ollama
shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
--intent "read and write files" \
--embedder ollama
Integrate with Claude Code
In your claude_desktop_config.json, replace direct MCP server entries with shutup as a proxy, or run shutup as a standalone gateway. Full integration docs are on the GitHub repo.
What's Next?
This is v0.1.0—a minimal, functional proxy that proves the pattern works. I'm actively working on:
- Benchmarking: Head-to-head comparison with Anthropic Tool Search, mcp-compressor, and Stacklok Optimizer (public dataset, reproducible).
- Hybrid search: BM25 + embeddings for better exact-match performance.
- Rust rewrite: Move embedding and similarity computation to Rust for sub-millisecond latency at scale.
- Tool usage analytics: Show which tools your agent actually uses vs. what gets filtered out.
Why I Built This
I was tired of watching my agent burn tokens on tools it would never use. Tired of "pick the wrong tool" errors. Tired of configuring regex filters every time I added a new MCP server.
The Redis team proved the pattern: treat tool selection as retrieval. 98% token reduction. 8x faster. Double the accuracy.
But their solution required Redis. Stacklok's required a commercial platform. Anthropic's couldn't reliably find the right tools.
I wanted something that worked out of the box, completely local, with zero configuration.
So I built it. In 200 lines of Python.
Try It Yourself
- GitHub: github.com/hjs-spec/shutup-mcp
-
PyPI:
pip install shutup-mcp
Star the repo if this solves a problem for you. PRs welcome—especially if you want to help with benchmarking or the Rust rewrite.
Your agent doesn't need 167 tools. It needs 3. Tell it to shutup.
Top comments (1)
This is solving a problem I'm about to hit head-on. I'm building a multi-agent platform with Composio (400+ integrations) plus filesystem, GitHub, Firecrawl, YouTube, and vault MCP servers. That's potentially thousands of tools visible to every agent.
Your numbers are eye-opening — 42% tool selection accuracy without filtering. I was planning to handle this with specialized agents (each agent only sees tools relevant to its role — security agent sees audit tools, writer sees content tools), but that's essentially manual configuration at the agent level. Your intent-based filtering with local embeddings is a much cleaner solution.
Question: how does shutup handle multi-step tasks where the intent shifts mid-conversation? Like "read this file, then create a GitHub issue about it" — the tool set changes between step 1 and step 2. Does it re-filter per message, or is the intent locked at session start?
The Redis stat about 150K+ tokens in production is wild. That's $0.50-2.00 burned on metadata alone before the agent even thinks. At scale with dozens of agents running cron tasks hourly, that adds up to hundreds of dollars per month on pure overhead.
Bookmarking this for integration. Great work.