<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: PavelTkachenk0</title>
    <description>The latest articles on DEV Community by PavelTkachenk0 (@paveltkachenk0).</description>
    <link>https://dev.to/paveltkachenk0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3977660%2Fd1b19fd5-1f75-4fa6-af73-62860d82a37a.jpeg</url>
      <title>DEV Community: PavelTkachenk0</title>
      <link>https://dev.to/paveltkachenk0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/paveltkachenk0"/>
    <language>en</language>
    <item>
      <title>The real MCP context tax isn't the schemas — it's the responses</title>
      <dc:creator>PavelTkachenk0</dc:creator>
      <pubDate>Wed, 10 Jun 2026 13:30:40 +0000</pubDate>
      <link>https://dev.to/paveltkachenk0/the-real-mcp-context-tax-isnt-the-schemas-its-the-responses-20g1</link>
      <guid>https://dev.to/paveltkachenk0/the-real-mcp-context-tax-isnt-the-schemas-its-the-responses-20g1</guid>
      <description>&lt;p&gt;If you've spent any time around MCP lately, you've seen the headline: &lt;em&gt;your MCP server is eating your context window.&lt;/em&gt; The number that gets quoted is scary — &lt;strong&gt;GitHub's MCP server costs you ~55,000 tokens before you type a single word&lt;/strong&gt;, a quarter of Claude's 200K window gone to tool definitions you haven't even used yet.&lt;/p&gt;

&lt;p&gt;It's a great hook. It's also two things at once: &lt;strong&gt;imprecise&lt;/strong&gt;, and &lt;strong&gt;pointed at the wrong half of the problem.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I built a small tool (&lt;a href="https://github.com/PavelTkachenk0/ContextTax" rel="noopener noreferrer"&gt;ContextTax&lt;/a&gt;) to measure the real number — with Claude's &lt;em&gt;own&lt;/em&gt; tokenizer, not an estimate — and the answer surprised me twice.&lt;/p&gt;

&lt;h2&gt;
  
  
  First: the schema cost, measured precisely
&lt;/h2&gt;

&lt;p&gt;The "55K" figure comes from community counts (one put it at 55,000 across 93 tool definitions; another at ~42,000). None of them state which tokenizer they used, whether they counted just the schemas or also the server's injected instructions, or which version of the server. The per-tool breakdowns are explicitly &lt;em&gt;"illustrative, not precise."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So I measured the current official &lt;code&gt;github-mcp-server&lt;/code&gt; with Anthropic's &lt;code&gt;count_tokens&lt;/code&gt; endpoint — Claude's actual tokenizer, the same one billed at inference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;MCP server&lt;/th&gt;
&lt;th&gt;Tools&lt;/th&gt;
&lt;th&gt;Schema cost (&lt;code&gt;count_tokens&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;% of 200K window&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Playwright&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;4,633&lt;/td&gt;
&lt;td&gt;2.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub — default toolset&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10,928&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.5%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub — all toolsets&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;20,404&lt;/td&gt;
&lt;td&gt;10.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;td&gt;18,983&lt;/td&gt;
&lt;td&gt;9.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GitHub's default toolset is &lt;strong&gt;10,928 tokens — 5.5% of the window&lt;/strong&gt;; turn on every toolset and it's &lt;strong&gt;20,404 (10.2%)&lt;/strong&gt;. Both real, significant, and reproducible. The 42–55K community figures counted &lt;em&gt;more&lt;/em&gt; tools (93, versus today's 82) and — with no stated method — likely the server's instructions and client framing on top. So they aren't so much &lt;em&gt;wrong&lt;/em&gt; as &lt;strong&gt;unsourced and broader&lt;/strong&gt;. The point isn't a gotcha: a number you can cite should ship with a tokenizer, a scope, and a version. These do — they count the Anthropic &lt;code&gt;tools&lt;/code&gt; payload (the canonical schema cost), so read them as the &lt;em&gt;floor&lt;/em&gt; of what a given client actually injects.&lt;/p&gt;

&lt;p&gt;That's the boring half of the story. Here's the half nobody's measuring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Second surprise: the schemas were never the main cost
&lt;/h2&gt;

&lt;p&gt;Every "MCP is eating your context" post — and there are a &lt;em&gt;lot&lt;/em&gt; of them — talks about &lt;strong&gt;tool schemas&lt;/strong&gt;, and proposes the same fix: load tools lazily, give the agent a &lt;code&gt;search_tools&lt;/code&gt; call, trim the definitions. All true, all about the &lt;strong&gt;fixed&lt;/strong&gt; cost you pay once per session.&lt;/p&gt;

&lt;p&gt;But tools don't just sit there. They &lt;strong&gt;return&lt;/strong&gt; things — and every tool call drops its &lt;em&gt;response&lt;/em&gt; into your context, where it stays. Those responses dwarf the schemas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfrlrk9rs47tpqan5yx5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfrlrk9rs47tpqan5yx5.png" alt="Bar chart: MCP tool schemas cost 4,633–20,404 tokens, while one browser_snapshot response is 38,831 — about 8× the schema" width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One &lt;code&gt;browser_snapshot&lt;/code&gt; from the Playwright MCP server — a single accessibility-tree dump of one GitHub page — measures &lt;strong&gt;38,831 tokens. 19.4% of your context window. From one call.&lt;/strong&gt; That's roughly &lt;strong&gt;8× Playwright's entire tool schema&lt;/strong&gt;. And unlike the schema, which you load once, every call adds &lt;em&gt;another&lt;/em&gt; response — they accumulate for the rest of the session.&lt;/p&gt;

&lt;p&gt;We've been optimizing the appetizer. The meal is the responses, and almost nobody is measuring it.&lt;/p&gt;

&lt;p&gt;This is the part that should change how you think about MCP cost:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema bloat is a one-time tax.&lt;/strong&gt; Lazy-loading helps, and it's worth doing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response bloat is a recurring tax.&lt;/strong&gt; A handful of verbose tool calls — a page snapshot, a directory listing, a 50-row query result — can blow past your entire schema budget in a single turn, and keep doing it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're paginating, truncating, or summarizing anything, the responses are where the leverage is.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the measurement works (so you can trust it)
&lt;/h2&gt;

&lt;p&gt;Two principles, because a benchmark you can't reproduce is just a vibe:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ground truth, not estimates.&lt;/strong&gt; ContextTax counts with Anthropic's &lt;code&gt;count_tokens&lt;/code&gt; — Claude's real tokenizer — not a &lt;code&gt;tiktoken&lt;/code&gt;/&lt;code&gt;o200k&lt;/code&gt; approximation. That matters more than you'd think: I checked the offline o200k estimate against &lt;code&gt;count_tokens&lt;/code&gt; and it's wrong in &lt;em&gt;both&lt;/em&gt; directions — it &lt;strong&gt;undercounts&lt;/strong&gt; tool schemas by 16–43%, but &lt;strong&gt;overcounts&lt;/strong&gt; that big snapshot response by ~7%. A proxy tokenizer isn't a safe substitute for any number you intend to cite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Marginal delta.&lt;/strong&gt; Every figure is the &lt;em&gt;difference&lt;/em&gt; a payload makes to a real request — &lt;code&gt;count(with it) − count(without it)&lt;/code&gt; — so you're measuring exactly what &lt;strong&gt;occupies your context window&lt;/strong&gt;, framing and all: what the API counts as input. (Prompt caching can lower the &lt;em&gt;price&lt;/em&gt; of those tokens on repeat calls, but not the space they take up.) Pin the server version and the model, and anyone re-running the command gets the same number.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# install (single binary, no .NET needed)&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://github.com/PavelTkachenk0/ContextTax/releases/latest/download/install.sh | sh

&lt;span class="c"&gt;# schema cost of a live server from your config&lt;/span&gt;
contexttax measure &lt;span class="nt"&gt;--server&lt;/span&gt; github

&lt;span class="c"&gt;# the cost of any tool response — pipe it straight in&lt;/span&gt;
pbpaste | contexttax response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a keyless offline mode too (&lt;code&gt;-e&lt;/code&gt;), clearly labelled &lt;code&gt;≈&lt;/code&gt;, for when you don't have an API key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measure your own stack
&lt;/h2&gt;

&lt;p&gt;The headline numbers above are a starting point, not a verdict — your toolset, your servers, and your typical responses are what actually matter. So measure them.&lt;/p&gt;

&lt;p&gt;ContextTax is a single-file CLI for macOS, Linux, and Windows (&lt;a href="https://github.com/PavelTkachenk0/ContextTax" rel="noopener noreferrer"&gt;MIT&lt;/a&gt;). There's also a community &lt;a href="https://github.com/PavelTkachenk0/ContextTax/blob/main/LEADERBOARD.md" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; of servers by context tax — PRs welcome; measure a server, send the number.&lt;/p&gt;

&lt;p&gt;The "55K" panic got one thing right: MCP servers quietly cost you a lot. It just pointed at the schemas. Once you start measuring the &lt;em&gt;responses&lt;/em&gt;, the picture — and where you should optimize — changes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Repo + reproduce: &lt;a href="https://github.com/PavelTkachenk0/ContextTax" rel="noopener noreferrer"&gt;github.com/PavelTkachenk0/ContextTax&lt;/a&gt;. Numbers measured with &lt;code&gt;count_tokens&lt;/code&gt;, &lt;code&gt;claude-sonnet-4-5&lt;/code&gt;, 200K window, against pinned server versions.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>mcp</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
