yuqiang

Posted on Apr 13

My AI Agent Could See 167 Tools. Then I Told It to shutup.

#ai #agents #mcp #agentskills

Token usage dropped. Accuracy improved. And I built a 200-line Python proxy to prove it.

The Problem Nobody Talks About

MCP (Model Context Protocol) was supposed to be the universal remote for AI agents. Connect once, and your agent can interact with GitHub, Jira, Slack, filesystems, databases—you name it.

But here's what nobody tells you: connect four MCP servers, and your agent burns 60,000 tokens before you even say "hello."

Redis ran the numbers. A typical setup with Redis, GitHub, Jira, and Grafana—four servers, 167 tools—consumes ~60,000 tokens upfront just loading tool descriptions. In production, it's often 150,000+ tokens.

Atlassian found their own MCP server alone consumes ~10,000 tokens for Jira and Confluence. GitHub's official server exposes 94 tools and chews through ~17,600 tokens per request. Combine several, and you hit 30,000+ tokens of pure metadata—before your agent solves anything.

Every extra tool is a chance to pick the wrong one. Redis measured 42% tool selection accuracy without filtering. The model gets lost in the noise, grabs the wrong tool, overwrites data, or sends requests into the void.

We gave agents unlimited power. And they became slower, dumber, and more expensive.

The Solutions (and Why They're Not Enough)

The industry noticed. Multiple solutions emerged:

Approach	Example	Core Problem
Regex-based filtering	`mcpwrapped`, `Tool Filter MCP`	You must manually configure which tools to hide. 167 tools? Good luck.
Schema compression	Atlassian `mcp-compressor` (97% reduction)	Strips descriptions to save tokens, but accuracy drops—models can't tell `create_jira_issue` from `create_confluence_page`.
Tool Search (Anthropic)	Claude Code built-in	85% token reduction, but only 34% selection accuracy in independent testing.
Vector search (Redis)	Redis Tool Filtering	98% token reduction, 8x faster, 2x accuracy—but requires Redis infrastructure.
Hybrid search (Stacklok)	MCP Optimizer	94% accuracy on 2,792 tools, but closed-source commercial product.

All of them fall into one of two traps:

Manual configuration: You have to know in advance which tools to hide.
Heavy infrastructure: You need Redis, a cloud service, or a commercial license.

What I wanted was simple: zero-config, 100% local, and smart enough to figure out what tools I actually need.

So I built it.

Introducing `shutup-mcp`

shutup is an MCP proxy that shows your agent only the tools it actually needs—zero config, 100% local, no API keys.

shutup --config ~/claude_desktop_config.json --intent "read and write files"

That's it. Behind the scenes, shutup:

Reads your MCP config and discovers all connected servers—filesystem, GitHub, Jira, whatever.
Fetches all tool definitions and builds a local embedding index using all-MiniLM-L6-v2 (~80MB, runs entirely offline).
Watches for changes—add a new MCP server, shutup rebuilds the index automatically.
Filters tools by intent—when your agent requests tools, shutup intercepts and returns only the top-K most relevant ones.

Your agent never knows the other 79,997 tools exist.

Why This Approach Wins

1. Zero Config, Actually

No regex. No YAML. No manual whitelists. You already have a claude_desktop_config.json. shutup reads it directly.

{
  "mcpServers": {
    "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"] },
    "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"] },
    "fetch": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-fetch"] }
  }
}

shutup connects to all three, aggregates their tools, and filters them intelligently. No extra configuration files needed.

2. Intent-Based Filtering

Most proxies hide tools based on names or regex patterns. shutup hides tools based on what you're actually trying to do.

Say "read and write files"—shutup returns filesystem tools, hiding GitHub and fetch tools.

Say "create a GitHub issue"—shutup surfaces GitHub tools while hiding filesystem operations.

It treats tool selection as a retrieval problem, not a reasoning one—the same insight that drove Redis to 98% token reduction.

3. Multi-Server Aggregation

This is where shutup differs from most open-source alternatives. It doesn't just filter one MCP server—it aggregates all of them.

When Stacklok analyzed 2,792 tools, they found 94% selection accuracy using hybrid search. But their Optimizer is a commercial product. shutup brings the same pattern—semantic retrieval across multiple servers—to an open-source, zero-dependency tool.

4. Privacy-First, 100% Local

Two embedding backends:

sentence-transformers (default): Downloads all-MiniLM-L6-v2 once (~80MB), runs entirely offline.
ollama: Use nomic-embed-text or any Ollama embedding model. Completely air-gapped.

No API keys. No telemetry. No cloud dependencies.

Benchmark Context (Why This Matters)

Let's put numbers to the problem.

Scenario	Tools Loaded	Token Overhead (Est.)	Selection Accuracy
Single MCP server (GitHub)	94	~17,600	79-88% (Opus 4.5)
Four servers (Redis+GitHub+Jira+Grafana)	167	~60,000	~42% (without filtering)
Enterprise setup (10+ servers)	500+	150,000+	< 30%

Sources: Atlassian, Redis, Stacklok, Anthropic

Now look at what filtering achieves:

Solution	Token Reduction	Selection Accuracy	Infrastructure Required
Anthropic Tool Search	85%	34% (2,792 tools)	Built into Claude
Atlassian mcp-compressor	70-97%	Drops at high compression	Proxy only
Redis Tool Filtering	98%	85%	Redis + vector DB
Stacklok MCP Optimizer	60-85%	94%	Commercial platform
shutup-mcp	~98% (projected)	TBD (benchmarking)	Zero

shutup uses the same architectural pattern as Redis (vector embeddings + semantic search) but without the Redis dependency. It's the "Redis approach" in a single pip install.

How It Works (Under the Hood)

Architecture

Agent (Claude Code / Cursor / Windsurf)
    ↓
shutup-mcp (stdio proxy)
    ↓
┌─────────────────────────┐
│ ServerManager           │
│ - Parses mcp.json       │
│ - Manages connections   │
│ - Watches for changes   │
└─────────────────────────┘
    ↓
┌─────────────────────────┐
│ ToolEmbedder            │
│ - Builds local index    │
│ - Cosine similarity     │
│ - Returns top-K tools   │
└─────────────────────────┘
    ↓
Upstream MCP Servers (filesystem, github, fetch, …)

Core Loop

Startup: Parse claude_desktop_config.json, connect to each MCP server, fetch tool definitions.
Embed: For each tool, create text "{name}: {description}" and embed using chosen backend.
Request: User provides intent (e.g., --intent "read and write files").
Filter: Compute cosine similarity, return top-K tools (default K=5).
Proxy: Forward tools/list and tools/call requests transparently.

Example

$ shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
         --intent "create a GitHub issue about the API outage" \
         --top-k 3

[shutup] Loading config: claude_desktop_config.json
[shutup] Connected to 3 MCP servers (filesystem, github, fetch)
[shutup] Fetched 47 total tools
[shutup] Intent: "create a GitHub issue about the API outage"
[shutup] Returning 3/47 tools:
  - github__create_issue
  - github__list_issues
  - github__get_repo

The agent only sees 3 tools. Token overhead drops from ~25,000 to ~300.

Getting Started

Install

pip install shutup-mcp

Run

# Default: sentence-transformers (auto-downloads model)
shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
       --intent "your task description"

# Privacy mode: use Ollama
shutup --config ~/Library/Application\ Support/Claude/claude_desktop_config.json \
       --intent "read and write files" \
       --embedder ollama

Integrate with Claude Code

In your claude_desktop_config.json, replace direct MCP server entries with shutup as a proxy, or run shutup as a standalone gateway. Full integration docs are on the GitHub repo.

What's Next?

This is v0.1.0—a minimal, functional proxy that proves the pattern works. I'm actively working on:

Benchmarking: Head-to-head comparison with Anthropic Tool Search, mcp-compressor, and Stacklok Optimizer (public dataset, reproducible).
Hybrid search: BM25 + embeddings for better exact-match performance.
Rust rewrite: Move embedding and similarity computation to Rust for sub-millisecond latency at scale.
Tool usage analytics: Show which tools your agent actually uses vs. what gets filtered out.

Why I Built This

I was tired of watching my agent burn tokens on tools it would never use. Tired of "pick the wrong tool" errors. Tired of configuring regex filters every time I added a new MCP server.

The Redis team proved the pattern: treat tool selection as retrieval. 98% token reduction. 8x faster. Double the accuracy.

But their solution required Redis. Stacklok's required a commercial platform. Anthropic's couldn't reliably find the right tools.

I wanted something that worked out of the box, completely local, with zero configuration.

So I built it. In 200 lines of Python.

Try It Yourself

GitHub: github.com/hjs-spec/shutup-mcp
PyPI: pip install shutup-mcp

Star the repo if this solves a problem for you. PRs welcome—especially if you want to help with benchmarking or the Rust rewrite.

Your agent doesn't need 167 tools. It needs 3. Tell it to shutup.

Top comments (3)

Pavel Gajvoronski • Apr 13

This is solving a problem I'm about to hit head-on. I'm building a multi-agent platform with Composio (400+ integrations) plus filesystem, GitHub, Firecrawl, YouTube, and vault MCP servers. That's potentially thousands of tools visible to every agent.
Your numbers are eye-opening — 42% tool selection accuracy without filtering. I was planning to handle this with specialized agents (each agent only sees tools relevant to its role — security agent sees audit tools, writer sees content tools), but that's essentially manual configuration at the agent level. Your intent-based filtering with local embeddings is a much cleaner solution.
Question: how does shutup handle multi-step tasks where the intent shifts mid-conversation? Like "read this file, then create a GitHub issue about it" — the tool set changes between step 1 and step 2. Does it re-filter per message, or is the intent locked at session start?
The Redis stat about 150K+ tokens in production is wild. That's $0.50-2.00 burned on metadata alone before the agent even thinks. At scale with dozens of agents running cron tasks hourly, that adds up to hundreds of dollars per month on pure overhead.
Bookmarking this for integration. Great work.

yuqiang • Apr 13 • Edited

@pavelbuild — quick update: v0.2.0 is now live, and it addresses the core of the multi-step intent issue you raised.

The proxy now extracts intent dynamically from each tool call's arguments (fields like query, message, prompt), so the tool set refreshes when the task shifts mid-conversation. Hybrid search (BM25 + vector + RRF fusion) is also in.

Honest caveat: intent extraction currently happens during tool calls. If the first step is pure conversation (no tool invocation yet), the initial list_tools may still return the full set. I'm exploring intercepting prompts/get or other message types to close this gap in v0.3.0.

Would love to hear how it performs with Composio—and whether this limitation actually bites in practice or is more theoretical. Your feedback directly shaped this release, and will shape the next one too.

Pavel Gajvoronski • Apr 13

This is incredible — v0.2.0 addressing the exact issue I raised, within hours. That's the kind of velocity that makes open source powerful.
Dynamic intent extraction from tool call arguments is exactly the right approach. In my setup with 28 agents running through chains, the intent shifts constantly — Nova researches, then Atlas designs, then Kai codes — each step needs different tools. Per-call extraction solves this cleanly.
The caveat about initial list_tools returning the full set during pure conversation is fine for our use case — we always start with a Conductor agent that decomposes the task before any tool calls happen, so the first tool call already carries clear intent.
I'll test it with Composio (400+ tools) once our platform reaches that milestone and share real performance data. Particularly interested in how hybrid search (BM25 + vector + RRF) handles the Composio tool namespace — their tool descriptions vary wildly in quality.
Thanks for building in the open. Following the repo for v0.3.0.

The Problem Nobody Talks About

The Solutions (and Why They're Not Enough)

Introducing shutup-mcp

Why This Approach Wins

1. Zero Config, Actually

2. Intent-Based Filtering

3. Multi-Server Aggregation

4. Privacy-First, 100% Local

Benchmark Context (Why This Matters)

How It Works (Under the Hood)

Architecture

Core Loop

Example

Getting Started

Install

Run

Integrate with Claude Code

What's Next?

Why I Built This

Try It Yourself

Introducing `shutup-mcp`