DEV Community

Ali Al-Jaafari
Ali Al-Jaafari

Posted on

Your MCP servers are burning 50k+ tokens before you type a word

Here is something I did not realize about the Model Context Protocol until my context window kept feeling full for no reason.

Every MCP server you connect loads its full set of tool definitions into the context window on every single request. Those schemas are not free. Each tool costs a few hundred tokens, and they are sent before the model reads a word of your prompt.

Five typical servers, with a dozen or more tools each, commonly add up to 50,000 to 75,000 tokens of overhead per request. That is real money on every call, and latency you feel on every turn. It also crowds out the context you actually want the model to use.

Measure it first

You cannot cut what you cannot see. A rough rule is about 200 tokens per tool plus a small per-server overhead. I built a tiny tool that prints an estimate for your real config (and checks security while it is at it):

pipx install git+https://github.com/alih552/mcp-audit
mcp-audit
# -> 7 server(s) - ~13,160 context tokens - score 0/100
Enter fullscreen mode Exit fullscreen mode

It runs fully locally, zero dependencies, MIT.

Then cut it

  1. Turn off what you are not using. The biggest lever and the easiest. Most people leave servers connected that they touched once. Going from seven always-on servers to the two you actually use can reclaim tens of thousands of tokens.
  2. Remove redundant servers. Two search servers, two file servers. Pick one per capability.
  3. Trim tool surface on servers you build. Ten focused tools beat thirty overlapping ones, both for token cost and for the model picking the right one. Keep descriptions tight.
  4. Load niche servers on demand rather than keeping everything always on.

The default of "everything connected all the time" is what creates the bloat. A few minutes of cleanup pays for itself on every request after.

Repo and the full writeup: https://github.com/alih552/mcp-audit

Curious what context-token number people get on their setups.

Top comments (1)

Collapse
 
unitbuilds profile image
UnitBuilds

Worth noting, MCPs are a tradeoff. If you're using context caching, then it becomes really cheap. You essentially pay full token price once, then 5% the price for each request after. Though yes, the MCP standard is a bloated mess... It's slow and it doesnt even prevent hallucinations. Why I had to redesign it for NMCP for VELOCITY OS. A standard Node.js hosted json mcp takes over a milisecond just to deserialize, whereas NMCP runs at nanoseconds and doesnt need to serialize or deserialize, it's deterministic, so no label + description, it's logically written in a way the LLM natively understands.