The real MCP context tax isn't the schemas — it's the responses

#ai #claude #mcp #llm

If you've spent any time around MCP lately, you've seen the headline: your MCP server is eating your context window. The number that gets quoted is scary — GitHub's MCP server costs you ~55,000 tokens before you type a single word, a quarter of Claude's 200K window gone to tool definitions you haven't even used yet.

It's a great hook. It's also two things at once: imprecise, and pointed at the wrong half of the problem.

I built a small tool (ContextTax) to measure the real number — with Claude's own tokenizer, not an estimate — and the answer surprised me twice.

First: the schema cost, measured precisely

The "55K" figure comes from community counts (one put it at 55,000 across 93 tool definitions; another at ~42,000). None of them state which tokenizer they used, whether they counted just the schemas or also the server's injected instructions, or which version of the server. The per-tool breakdowns are explicitly "illustrative, not precise."

So I measured the current official github-mcp-server with Anthropic's count_tokens endpoint — Claude's actual tokenizer, the same one billed at inference:

MCP server	Tools	Schema cost (`count_tokens`)	% of 200K window
Playwright	23	4,633	2.3%
GitHub — default toolset	43	10,928	5.5%
GitHub — all toolsets	82	20,404	10.2%
Azure	65	18,983	9.5%

GitHub's default toolset is 10,928 tokens — 5.5% of the window; turn on every toolset and it's 20,404 (10.2%). Both real, significant, and reproducible. The 42–55K community figures counted more tools (93, versus today's 82) and — with no stated method — likely the server's instructions and client framing on top. So they aren't so much wrong as unsourced and broader. The point isn't a gotcha: a number you can cite should ship with a tokenizer, a scope, and a version. These do — they count the Anthropic tools payload (the canonical schema cost), so read them as the floor of what a given client actually injects.

That's the boring half of the story. Here's the half nobody's measuring.

Second surprise: the schemas were never the main cost

Every "MCP is eating your context" post — and there are a lot of them — talks about tool schemas, and proposes the same fix: load tools lazily, give the agent a search_tools call, trim the definitions. All true, all about the fixed cost you pay once per session.

But tools don't just sit there. They return things — and every tool call drops its response into your context, where it stays. Those responses dwarf the schemas.

One browser_snapshot from the Playwright MCP server — a single accessibility-tree dump of one GitHub page — measures 38,831 tokens. 19.4% of your context window. From one call. That's roughly 8× Playwright's entire tool schema. And unlike the schema, which you load once, every call adds another response — they accumulate for the rest of the session.

We've been optimizing the appetizer. The meal is the responses, and almost nobody is measuring it.

This is the part that should change how you think about MCP cost:

Schema bloat is a one-time tax. Lazy-loading helps, and it's worth doing.
Response bloat is a recurring tax. A handful of verbose tool calls — a page snapshot, a directory listing, a 50-row query result — can blow past your entire schema budget in a single turn, and keep doing it.

If you're paginating, truncating, or summarizing anything, the responses are where the leverage is.

How the measurement works (so you can trust it)

Two principles, because a benchmark you can't reproduce is just a vibe:

Ground truth, not estimates. ContextTax counts with Anthropic's count_tokens — Claude's real tokenizer — not a tiktoken/o200k approximation. That matters more than you'd think: I checked the offline o200k estimate against count_tokens and it's wrong in both directions — it undercounts tool schemas by 16–43%, but overcounts that big snapshot response by ~7%. A proxy tokenizer isn't a safe substitute for any number you intend to cite.

Marginal delta. Every figure is the difference a payload makes to a real request — count(with it) − count(without it) — so you're measuring exactly what occupies your context window, framing and all: what the API counts as input. (Prompt caching can lower the price of those tokens on repeat calls, but not the space they take up.) Pin the server version and the model, and anyone re-running the command gets the same number.

# install (single binary, no .NET needed)
curl -fsSL https://github.com/PavelTkachenk0/ContextTax/releases/latest/download/install.sh | sh

# schema cost of a live server from your config
contexttax measure --server github

# the cost of any tool response — pipe it straight in
pbpaste | contexttax response

There's a keyless offline mode too (-e), clearly labelled ≈, for when you don't have an API key.

Measure your own stack

The headline numbers above are a starting point, not a verdict — your toolset, your servers, and your typical responses are what actually matter. So measure them.

ContextTax is a single-file CLI for macOS, Linux, and Windows (MIT). There's also a community leaderboard of servers by context tax — PRs welcome; measure a server, send the number.

The "55K" panic got one thing right: MCP servers quietly cost you a lot. It just pointed at the schemas. Once you start measuring the responses, the picture — and where you should optimize — changes.

Repo + reproduce: github.com/PavelTkachenk0/ContextTax. Numbers measured with count_tokens, claude-sonnet-4-5, 200K window, against pinned server versions.