DEV Community

Leon
Leon

Posted on • Originally published at taprun.dev

MCP is the authoring layer. Execution should cost zero tokens.

Two posts on Reddit this month independently measured MCP's token overhead. Both reached the same number: 30–40% more tokens than the CLI equivalent.

"I added Notion, Sentry and Shortcut MCPs and was surprised to see every session starting off with 40% of the context used."
— NoSlicedMushrooms (28 upvotes), r/ClaudeAI

"A batch job with 4 MCP servers blew through our token budget in 2 hours. The schema injection on every turn is the killer."
— tom_mathews, r/ClaudeAI

The "MCP is dead, just use CLI" take followed immediately. But three independent users — in three different threads, on three different subreddits — arrived at the same conclusion: the problem isn't MCP. It's using MCP for the wrong job.

"MCP for the main orchestrator, CLI for sub-agents. Both hit the same backend."
— raphasouthall, r/mcp (48 upvotes)

"MCP makes sense for discovery, not for known workflows."
— tom_mathews, r/ClaudeAI

"Development Tool versus Production Tool. MCP the shit you serve to clients and CLI while building."
— mat8675, r/ClaudeAI

They're all describing the same architecture. And it's the architecture Tap has used from day one.

The Two-Layer Model

Layer 1: MCP (Authoring)
forge.inspect    → AI analyzes the site
forge.verify     → AI tests the program
forge.save       → program saved to disk

AI participates. Tokens consumed. One-time cost.

─────────────────────────────────────────────

Layer 2: CLI (Execution)
tap.run          → program executes

Zero AI. Zero tokens. Deterministic. Forever.
Enter fullscreen mode Exit fullscreen mode

MCP is the authoring layer. It's where AI discovers what the site looks like, what API endpoints are available, which selectors match the data, and how to structure the extraction. This is a one-time process — forge — that produces a .tap.js file.

After that, tap.run executes the program directly. No MCP. No schema injection. No token overhead. The program is JavaScript. It runs in less than a second.

The Numbers

raphasouthall measured MCP overhead precisely for a 21-tool server:

MCPCLI / tap.run
Upfront cost~1,300 tokens (schema injection)0
Per-call cost~800 tokens~750 tokens
After 10 calls~880 tokens/call (amortized)750 tokens/call

For a single forge session (one-time), ~1,300 tokens of overhead is nothing. For 1,000 daily executions? It's the difference between $0 and $135/month.

Tap's architecture makes this explicit: pay the MCP overhead once during forge, then run at zero overhead forever.

How Tap's 40 MCP Tools Don't Blow Up Your Context

The obvious concern: Tap ships 40 MCP tools. With 21 tools costing ~1,300 tokens of schema, 40 tools should cost ~2,500+. That's over 1% of a 200k context window before you even ask a question.

Tap uses deferred tool loading. Only 12 core tools load at session start (~600 tokens). The other 28 — forge, doctor, fix, trace, watch, explain — load on demand, only when the agent actually needs them.

# What loads at session start (Tier 1 — always available)
tap.list   tap.run    tap.doctor   tap.nav
tap.click  tap.type   tap.eval     tap.find
tap.screenshot  tap.runtime  tap.pressKey  tap.upload

# What loads on demand (Tier 2 — disclosed via hints)
tap.fix    tap.explain   tap.trace    tap.watch
tap.refresh   tap.cookies   tap.wait   ...

# Forge tools — only load during forge sessions
forge.inspect   forge.draft   forge.verify   forge.save
Enter fullscreen mode Exit fullscreen mode

This is the same pattern the community arrived at independently:

"Splitting tools into a tiny default set and a second on-demand pack, because dumping every possible tool into session start was where the waste really showed up."
— Organic-Bid-8298, r/mcp

Why Not Just Use CLI for Everything?

Because authoring requires tool discovery. When AI is figuring out how to scrape a site it's never seen before, it needs typed parameters, rich descriptions, and structured responses. That's what MCP does well.

"The one thing MCP does well is when it's tightly integrated (like Claude Code's built-in tools) — that feels natural because they control both sides."
— SmartYogurtcloset715 (8 upvotes), r/ClaudeAI

Tap controls both sides. The MCP server and the CLI are the same binary. The MCP tools call the same functions the CLI calls. The difference is when each is used:

  • Forge (one-time): MCP tools, because AI needs to discover and iterate

  • Run (every time): CLI, because the program already exists

  • Doctor (periodic): either — MCP for interactive diagnosis, CLI for scheduled health checks

The Implication for Browser Automation

Most browser MCP tools are execution-layer tools. They run in the browser on every call. That's where the token cost comes from — not just schema overhead, but the entire page state (accessibility tree, screenshot bytes, console output) flowing into the context window on every interaction.

"Every browser_navigate + browser_snapshot call costs ~1,500 tokens in JSON schema framing — even though the actual useful output is just a few lines of text."
— BagNervous, r/ClaudeAI (Browser CLI author)

Tap's browser tools exist in MCP for authoring only. During forge, AI uses tap.nav, tap.eval, tap.screenshot to understand the page. After forge produces a .tap.js, execution calls the browser directly — no MCP framing, no token overhead, no context window pollution.

The 1,500-token-per-call problem doesn't exist for tap.run. It's not an MCP call. It's a function call.


Related

  • Health Contracts Catch What Pydantic Can't — semantic validation for scraper output

  • Programs Beat Prompts — why AI should write code, not run it

  • The Interface Protocol — 8 operations that replace every browser automation SDK

Top comments (0)