I Enabled MCP on My AI Coding Agent and My Token Bill Tripled: Here's the Math

#ai #llm #agents #llmtools

I turned on three MCP servers for my coding agent last month. Everything felt faster, smarter, better. Then the monthly API bill arrived — 3x higher than the month before. The irony: I wasn't even using most of what those servers offered.

That gap between what MCP feels like and what it costs is what I call the MCP context tax. And it's quietly wrecking budgets across teams that enabled "just one more tool."

The Numbers Behind the Feeling

Here's what actually happens when you connect an MCP server to your agent.

Every MCP tool call wraps your prompt in a structured shell — the tool name, arguments, descriptions, and response schemas. A simple filesystem.read call that returns 200 characters of file content might add 800 tokens to your context window. Multiply that by dozens of calls per task, and you're burning tokens on metadata your agent doesn't even reason about.

The data from the field backs this up. Iternal's March 2026 benchmark series found that most models reliably use only 50 to 65% of their advertised context window effectively. Your million-token context isn't a million tokens of reasoning — it's a million tokens of overhead, tool definitions, and retrieval artifacts your model is filtering through.

For MCP specifically, the tax is even starker. Independent analysis from QCode.cc and ShareUhack both measured 10 to 32x more tokens consumed per MCP-assisted task compared to the equivalent direct API call. A task that would cost you $0.02 in raw API calls costs $0.20 to $0.64 with MCP middleware in the loop.

A Real Example: The Repo Analysis Task

I run a weekly code health check across a 12-repository monorepo. Here's the comparison:

Without MCP:

Direct API calls: ~12,000 tokens per repo
12 repos × 5 agents in parallel: ~72,000 tokens
Cost at $0.01/1K tokens: $0.72

With Filesystem + GitHub MCP servers:

Tool definitions: ~4,000 tokens (loaded once, shared — but still)
Per-call overhead including schema metadata: ~2,800 tokens per call
~40 tool calls per repo across 5 parallel agents: ~96,000 tokens
Cost: $2.18 — or 3x the baseline

The agent was smarter about which files to read. But the overhead cost more than the savings in reduced API calls.

The Context Tax in Practice

Here's what it looks like when you actually run this:

Task: "Find all TODOs in the auth service that are older than 90 days"
Model: Claude Opus 4.6
Without MCP (direct API): 14,200 tokens, $0.14
With MCP filesystem server: 38,400 tokens, $0.38
Tax: 24,200 extra tokens, 2.7x cost

The MCP overhead isn't linear either. Each additional MCP server you add to a single agent compounds the tool definition overhead. Three servers × their schemas × the round-trip formatting = a non-trivial chunk of every context window you pay for.

Three Fixes That Actually Work

I'm not saying don't use MCP. I'm saying use it with your wallet open.

1. Profile before you optimize. Run one task with and without MCP. Measure the actual token delta. If the delta is larger than the savings from smarter tool use, you're losing money. Budget $5-10 in API calls to get a real baseline.

2. Choose servers that reduce calls, not just improve quality. A GitHub MCP server that lets your agent navigate repos without 40 exploratory API calls is worth the overhead. A weather MCP server in a coding agent is pure cost with no ROI.

3. Use MCP gateways to share connections. If you run multiple agents, one shared MCP gateway connection (Linux Foundation's AAIF gateway is the reference) avoids loading tool definitions into every agent's context independently. This drops the per-agent overhead from N × schema_size to schema_size + N × call_overhead.

The Tradeoff Is Real But Solvable

MCP solved a real problem: tool interoperability across AI agents. Before it, every agent had its own way of calling external tools. Now Claude, Cursor, ChatGPT, Windsurf, and Gemini can all share the same server ecosystem. That's genuinely valuable.

But "14,000+ MCP servers" is not a sign that you should enable 14,000 MCP servers. It's a sign the ecosystem is mature enough that curation — not discovery — is the skill that separates a cost-efficient agent from a budget hemorrhage.

The question isn't "can I connect this?" It's "does this connection pay for itself?"

My three MCP servers are still enabled. I've just become deliberate about which tasks trigger them. And my token bill is back to where it was in March.

What I learned: The MCP context tax is real and measurable. The fix isn't disabling MCP — it's being honest about which MCP integrations actually reduce total work versus which ones just make the work feel better. The 10-32x overhead figures are averages; your actual tax depends on call frequency, schema size, and how much of your tool response you actually use. Profile your own usage before assuming you're optimized.