Piotr Hajdas

Posted on Jan 11 • Originally published at deploystack.io

MCP Token Limits: The Hidden Cost of Tool Overload

#mcp #devops #ai #opensource

You add a few MCP servers. GitHub for code, Notion for docs, maybe Slack for notifications. Suddenly Claude feels... slower. Less helpful. It misses context you explicitly provided. It gives generic answers to specific questions.

The Numbers Behind MCP Token Usage

Here's a stat that stopped me cold: the GitHub MCP server alone consumes 55,000 tokens across its 93 tool definitions. That's before you ask Claude anything. Before you paste any code. Before the conversation even starts.

Developer Scott Spence measured his MCP setup and found 66,000 tokens consumed at conversation start - one third of Claude Sonnet's 200k context window, gone. As the CodeRabbit team put it: "Most of us are now drowning in the context we used to beg for."

MCP token usage has become the silent killer of AI productivity. You install more tools hoping for more capability, and end up with less.

The Math: How MCP Token Consumption Adds Up

Every MCP server you connect loads its tool definitions into Claude's context. The formula is brutal:

servers × tools per server × tokens per tool = context consumed

Real numbers from popular MCP servers:

GitHub MCP: 55,000 tokens (93 tools)
Notion MCP: ~8,000 tokens (15+ tools)
Filesystem MCP: ~4,000 tokens (10 tools)

Average tool definition: 300-600 tokens. That includes the tool name, description, parameter schemas, and examples.

Run the math on a typical power user setup: 10 servers × 15 tools average × 500 tokens = 75,000 tokens gone.

That's over a third of your context window consumed by tool descriptions you might never use.

The tipping point comes faster than you'd expect. Cursor enforces a hard limit of 40 tools - they learned that more causes problems. Claude's output quality visibly degrades after 50+ tools. The model starts chasing tangents, referencing tools instead of your actual question.

MCP token limits aren't theoretical. They're the reason your AI assistant "forgot" what you told it three messages ago.

Why This Hurts More Than Your Wallet

Token bloat isn't just expensive. It actively makes your AI worse.

But let's talk money first. Real numbers.

As of January 2026, Claude Opus 4.5 costs $5 per million input tokens. Say you have a DevOps team of 5 people. Each developer's MCP setup consumes 75,000 tokens at conversation start. Each dev runs maybe 10 AI conversations per day.

The math:

Per day: 75,000 tokens × 5 devs × 10 conversations = 3.75 million input tokens
Daily cost: $18.75 - just for MCP tool definitions nobody asked for
Monthly (20 work days): $375 burned on context that describes tools you might never use

That's $375/month your team pays for token overhead. Not for actual AI work. For tool descriptions sitting in context.

With hierarchical routing (1,400 tokens instead of 75,000):

Monthly tokens: 1.4 million instead of 75 million
Monthly cost: $7 instead of $375
Savings: $368/month, or 98%

Now, cost is one thing. But the real damage isn't your API bill.

Relevance decay: When 100 tool definitions compete with your actual prompt, the signal drowns. Claude sees "create_github_issue" and "update_notion_page" when you're asking about a Python bug. Irrelevant context dilutes relevant context.

Model confusion: Large language models have finite attention. Force them to process 75,000 tokens of tool schemas, and less attention remains for your code, your question, your context.

The dependency hell parallel: Developer Jamie Duncan nailed this analogy. "Treating context windows as infinite resources creates unsustainable systems, just as infinite dependency installation historically created bloated software."

We've been here before. npm node_modules memes exist for a reason. MCP is following the same pattern: install everything, worry about consequences later.

Except with MCP, the consequence is immediate. Your AI gets dumber in real-time.

The Problem Nobody's Solving: Teams

Here's where it gets interesting.

Individual developers have options. Tools like code-mode, ToolHive, and Lazy Router implement hierarchical routing - exposing two meta-tools instead of hundreds. Token usage drops 90-98%. Problem solved, for solo developers.

But then you hire a second developer.

Developer one has their MCP config dialed in. Twenty servers, custom settings, credentials stored locally. Developer two joins and asks: "How do I set up MCP?"

"Copy my config file. Oh, and you'll need these API keys. I'll Slack them to you."

Sound familiar?

At five developers, chaos emerges:

Five different configurations. Dev 1 uses GitHub MCP v2.3. Dev 4 is on v2.1. Dev 5 couldn't get it working and disabled it. Nobody knows whose setup is "correct."

Credentials everywhere. API keys in Slack DMs. Tokens in .env files. Secrets in password managers, sticky notes, and "that doc I shared last month." You have no idea what credentials exist, who has access to what, or which ones are still active.

No visibility. Which MCP tools access customer data? Which have write permissions to production systems? Nobody knows. Nobody's tracking it.

Onboarding nightmare. New developer joins. Two hours minimum to replicate the team's MCP setup. More if something breaks - and something always breaks.

The security time bomb. Developer leaves the company. Their laptop has API keys to GitHub, Notion, Slack, and your internal tools. Who rotates those credentials? Who even knows what they had access to?

Every MCP token reduction tool solves the individual problem. None of them solve the team problem.

No credential vault. No role-based access control. No audit logging. No team isolation. No centralized configuration.

The gap in the market isn't token reduction. It's team management.

How Hierarchical Routing Solves Token Bloat

The technical fix is elegant. Instead of loading 100 tools into context, you expose just two:

discover_mcp_tools(query) - Search across all your MCP servers
execute_mcp_tool(tool_path, args) - Run the specific tool you need

That's it. The AI searches for relevant tools on demand, then executes them. No upfront loading of everything.

The math transforms:

Before: 10 servers × 15 tools × 500 tokens = 75,000 tokens
After: 2 meta-tools × ~700 tokens = 1,400 tokens

A 98% reduction in MCP token usage. Context window reclaimed.

This method is now table stakes. Multiple tools implement it. DeployStack's implementation is documented in detail at docs.deploystack.io/development/satellite/hierarchical-router.

But token reduction alone doesn't solve the team problem.

What makes MCP tooling team-ready:

Credential vault: API keys stored encrypted, auto-injected at runtime. No more tokens in Slack.
One URL for the whole team: Add a single endpoint to your config. Everyone gets the same servers, same settings, same tools.
Role-based access: Control who can use which MCP servers. Interns don't need production database access.
Audit logging: Know which tools accessed what data, when, by whom.

Individual developers can get by with local configs and manual credential management. Teams can't.

Where to Go From Here

If you're a solo developer hitting MCP token limits, you have options. Code-mode and ToolHive work well. Pick whichever fits your workflow.

If you're running a team - five developers, ten, twenty - you need more than token reduction. You need credential management. Access control. Visibility into what's actually happening across your MCP setup.

One URL. Everyone gets the same setup. No more "works on my machine" for MCP.

MCP token limits are a solved problem. Team MCP management isn't - until now.