Every AI agent re-sends its entire system prompt and every tool/function schema on every single turn. That fixed payload is billed as input tokens on each request — invisibly — until the bill arrives. I measured exactly how much across 13 real open-source agents and MCP servers (79 tools total).
The headline
The GitHub MCP server carries 3,546 tokens of tool schemas every turn — about $12.89 per 1,000 turns on Claude Sonnet, paid before the model reads a word of the user's question. 26 tools is all it takes to make the plumbing cost more than the work.
What I found
- Median per-turn overhead: 547 tokens — against a realistic 57-token user request, that's 9.6× larger than the question itself, ~91% of the input.
-
The fattest single tool is 827 tokens (
sequentialthinkingfrom the official MCP servers repo) — bigger than the entire toolset of 8 of the 12 tool-providers measured. - A 35× spread separates the leanest agent (101 tok) from the most bloated (3,546). Same pricing, same tokenizer — bloat is a design choice, not a tax of nature.
-
~20% of typical schema bytes is non-semantic cruft — pretty-printing, Pydantic's auto-added
title, zod's$schema/additionalProperties. Invisible tokens nobody wrote, billed every turn.
Full dataset — every number reproducible, repos pinned to commits: The Hidden Token Tax.
Measure your own agent
I built a free, in-browser tool that does this for your setup — paste your system prompt + tool schemas + sample outputs and it shows the per-turn breakdown, a $ projection across Claude / Haiku / OpenRouter, and flags the tool inflating your context (including the serialization cruft): Agent Token Profiler — no signup, no key.
How to fix it
Strip the cruft (compact JSON, drop auto-added fields), trim tool descriptions to what actually aids tool selection, load tools on demand (progressive disclosure), and route easy turns to a cheaper model like Haiku. The schema overhead is the half nobody meters — until they do.
Measured while building AgentLoop, a readable MIT Claude-agent starter. The production patterns — token metering, retries, multi-provider routing — are drop-in code in AgentLoop Pro.
Top comments (1)
Good measurement. Those schema tokens sit in the same window the model reasons in, so a bloated toolset costs you answer quality on top of the dollars. Loading tools on demand has been the fix for me, it keeps the context free for the actual task. More on treating context as finite here: prickles.org/tenet/working-context...