Two posts on Reddit this month independently measured MCP's token overhead. Both reached the same number: 30–40% more tokens than the CLI equivalent.
"I added Notion, Sentry and Shortcut MCPs and was surprised to see every session starting off with 40% of the context used."
— NoSlicedMushrooms (28 upvotes), r/ClaudeAI"A batch job with 4 MCP servers blew through our token budget in 2 hours. The schema injection on every turn is the killer."
— tom_mathews, r/ClaudeAI
The "MCP is dead, just use CLI" take followed immediately. But three independent users — in three different threads, on three different subreddits — arrived at the same conclusion: the problem isn't MCP. It's using MCP for the wrong job.
"MCP for the main orchestrator, CLI for sub-agents. Both hit the same backend."
— raphasouthall, r/mcp (48 upvotes)"MCP makes sense for discovery, not for known workflows."
— tom_mathews, r/ClaudeAI"Development Tool versus Production Tool. MCP the shit you serve to clients and CLI while building."
— mat8675, r/ClaudeAI
They're all describing the same architecture. And it's the architecture Tap has used from day one.
The Two-Layer Model
Layer 1: MCP (Authoring)
forge.inspect → AI analyzes the site
forge.verify → AI tests the program
forge.save → program saved to disk
AI participates. Tokens consumed. One-time cost.
─────────────────────────────────────────────
Layer 2: CLI (Execution)
tap.run → program executes
Zero AI. Zero tokens. Deterministic. Forever.
MCP is the authoring layer. It's where AI discovers what the site looks like, what API endpoints are available, which selectors match the data, and how to structure the extraction. This is a one-time process — forge — that produces a .tap.js file.
After that, tap.run executes the program directly. No MCP. No schema injection. No token overhead. The program is JavaScript. It runs in less than a second.
The Numbers
raphasouthall measured MCP overhead precisely for a 21-tool server:
MCPCLI / tap.run
Upfront cost~1,300 tokens (schema injection)0
Per-call cost~800 tokens~750 tokens
After 10 calls~880 tokens/call (amortized)750 tokens/call
For a single forge session (one-time), ~1,300 tokens of overhead is nothing. For 1,000 daily executions? It's the difference between $0 and $135/month.
Tap's architecture makes this explicit: pay the MCP overhead once during forge, then run at zero overhead forever.
How Tap's 40 MCP Tools Don't Blow Up Your Context
The obvious concern: Tap ships 40 MCP tools. With 21 tools costing ~1,300 tokens of schema, 40 tools should cost ~2,500+. That's over 1% of a 200k context window before you even ask a question.
Tap uses deferred tool loading. Only 12 core tools load at session start (~600 tokens). The other 28 — forge, doctor, fix, trace, watch, explain — load on demand, only when the agent actually needs them.
# What loads at session start (Tier 1 — always available)
tap.list tap.run tap.doctor tap.nav
tap.click tap.type tap.eval tap.find
tap.screenshot tap.runtime tap.pressKey tap.upload
# What loads on demand (Tier 2 — disclosed via hints)
tap.fix tap.explain tap.trace tap.watch
tap.refresh tap.cookies tap.wait ...
# Forge tools — only load during forge sessions
forge.inspect forge.draft forge.verify forge.save
This is the same pattern the community arrived at independently:
"Splitting tools into a tiny default set and a second on-demand pack, because dumping every possible tool into session start was where the waste really showed up."
— Organic-Bid-8298, r/mcp
Why Not Just Use CLI for Everything?
Because authoring requires tool discovery. When AI is figuring out how to scrape a site it's never seen before, it needs typed parameters, rich descriptions, and structured responses. That's what MCP does well.
"The one thing MCP does well is when it's tightly integrated (like Claude Code's built-in tools) — that feels natural because they control both sides."
— SmartYogurtcloset715 (8 upvotes), r/ClaudeAI
Tap controls both sides. The MCP server and the CLI are the same binary. The MCP tools call the same functions the CLI calls. The difference is when each is used:
Forge (one-time): MCP tools, because AI needs to discover and iterate
Run (every time): CLI, because the program already exists
Doctor (periodic): either — MCP for interactive diagnosis, CLI for scheduled health checks
The Implication for Browser Automation
Most browser MCP tools are execution-layer tools. They run in the browser on every call. That's where the token cost comes from — not just schema overhead, but the entire page state (accessibility tree, screenshot bytes, console output) flowing into the context window on every interaction.
"Every
browser_navigate+browser_snapshotcall costs ~1,500 tokens in JSON schema framing — even though the actual useful output is just a few lines of text."
— BagNervous, r/ClaudeAI (Browser CLI author)
Tap's browser tools exist in MCP for authoring only. During forge, AI uses tap.nav, tap.eval, tap.screenshot to understand the page. After forge produces a .tap.js, execution calls the browser directly — no MCP framing, no token overhead, no context window pollution.
The 1,500-token-per-call problem doesn't exist for tap.run. It's not an MCP call. It's a function call.
Related
Health Contracts Catch What Pydantic Can't — semantic validation for scraper output
Programs Beat Prompts — why AI should write code, not run it
The Interface Protocol — 8 operations that replace every browser automation SDK
Top comments (0)