DEV Community

Cover image for Your MCP server eats 55,000 tokens before your agent says a word -- I measured the real cost
Ken Imoto
Ken Imoto

Posted on • Originally published at zenn.dev

Your MCP server eats 55,000 tokens before your agent says a word -- I measured the real cost

The invisible bill

I was debugging why my Claude Code sessions felt sluggish after connecting a few MCP servers. Token usage was through the roof -- but I hadn't even asked the agent to do anything yet. I rewrote my prompts three times before I thought to check where the tokens were actually going.

Turns out, the moment you connect an MCP server, every tool definition gets loaded into the context window. Names, descriptions, parameter schemas, enum values -- all of it, on every single conversation turn. Not just when you call a tool. Every turn.

Think of it like walking into a library to read one book, but the librarian insists you read the entire catalog first. Every time you walk in.

The measurement: 4 servers, 500x cost difference

I measured the tool-definition token overhead for four MCP servers, from minimal to massive:

MCP Server Tools Est. tokens Monthly cost (10 calls)
PostgreSQL 1 ~35 ~$0.0005
Google Maps 7 ~704 ~$0.009
GitHub 26 ~4,242 ~$0.06
GitHub (full) 93 ~55,000 ~$0.74

PostgreSQL to full GitHub: a 1,500x difference. Same protocol, same "MCP server" label, radically different cost profiles.

And this is just the definition overhead. The actual tool calls consume additional tokens on top.

Where the tokens go

A single MCP tool definition looks harmless:

{
  "name": "gmail_create_draft",
  "description": "Creates a draft email...",
  "inputSchema": {
    "type": "object",
    "properties": {
      "to": { "type": "string", "description": "..." },
      "subject": { "type": "string", "description": "..." },
      "body": { "type": "string", "description": "..." }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

That single tool? 820 tokens. More than the entire PostgreSQL MCP server with its one tool.

Now multiply. A business API like a full accounting platform might expose 270+ tools across invoicing, HR, payroll, time tracking, and sales management. At ~65 tokens per tool average, that's 17,500 tokens consumed before your first question.

Connect three services like that simultaneously, and you're burning 143,000 out of 200,000 tokens on schema definitions alone. 71% of your context window is gone. Your agent is trying to think inside a closet.

At scale, the math gets uncomfortable: 1,000 requests/day with heavy MCP overhead = roughly $170/day = $5,100/month -- just for loading tool schemas.

The quality cliff

Token cost isn't even the worst part. Claude's output quality visibly degrades after 50+ tool definitions are loaded. The model starts chasing tangents, referencing tools instead of answering your actual question.

More tools in context doesn't mean more capability. Past a threshold, it means worse capability. I confirmed this firsthand -- five servers connected, and my agent started recommending create_github_issue as the fix for a database timeout. Very confident. Very wrong.

Three strategies to cut 95%

Strategy 1: Expose only what you need

If you're using an accounting platform's 270 tools but only need 10 for your tax filing workflow:

{
  "mcpServers": {
    "accounting": {
      "allowedTools": [
        "create_transaction",
        "list_transactions",
        "get_trial_balance",
        "list_account_items",
        "list_partners"
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

10 tools instead of 270: ~650 tokens instead of ~17,500. 96% reduction.

Strategy 2: Write tighter descriptions

API docs make terrible tool descriptions. They're written for humans who read documentation; LLMs need the compressed version.

// Before: ~80 tokens
{
  "description": "Uses the accounting API to create a new
    transaction (journal entry) for the specified company ID.
    You can specify amount, date, account item, partner name,
    memo, and more. Tax category is auto-determined."
}

// After: ~20 tokens
{
  "description": "Create transaction. Args: amount, date, account_item, partner"
}
Enter fullscreen mode Exit fullscreen mode

75% fewer tokens, same functionality. The model doesn't need a paragraph to understand what create_transaction does.

Strategy 3: Connect only when needed

Don't keep all MCP servers connected during every conversation. Connect the accounting server when you're doing accounting work. Disconnect it when you're writing code. This alone zeroes out overhead for unrelated tasks.

MCP Tool Search: the protocol-level fix

In January 2026, a protocol-level solution arrived: MCP Tool Search. When tool definitions exceed 10% of your context window, the client automatically defers loading them. Instead of dumping every schema into context, the model discovers and loads tools on-demand via search.

Early reports show a 95% reduction in startup token cost. The schema bloat problem is being solved at the infrastructure level, not just through workarounds.

But Tool Search isn't universally deployed yet. Until it is, the three strategies above are your defense.

What to check right now

1. Count your tools. Run tools/list against each connected MCP server and count the total. If you're above 30 tools across all servers, you're likely paying a meaningful overhead tax.

2. Audit descriptions. Look at the JSON schemas your servers return. Are the descriptions essay-length? Trim them. Every token in a description is paid on every conversation turn.

3. Use allowedTools. Most MCP clients support filtering which tools are exposed. Use it. There's no reason to load 270 tools when you need 10.

4. Measure before/after. Token usage is visible in most LLM clients. Check your per-turn consumption before and after connecting each MCP server. The numbers will tell you exactly which servers are expensive.

The irony of MCP: the protocol designed to extend AI capabilities can end up crippling them -- if you load too many tools and leave no room for actual thinking.


This article is based on Chapter 3 of MCP Security in Practice: What OWASP Won't Tell You About AI Tool Integrations. The book covers the full token cost analysis across services, OWASP MCP Top 10 security risks, file upload limitations, and production hardening patterns.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.