DEV Community

Cover image for 🧩 Runtime Snapshots #13 — The Scalpel and the Swiss Army Knife: 50k Tokens Before You Say Hello
Alechko
Alechko

Posted on

🧩 Runtime Snapshots #13 — The Scalpel and the Swiss Army Knife: 50k Tokens Before You Say Hello

TL;DR: MCP burns 45k+ tokens on tool descriptions before your first prompt. E2LLM = 0 tokens until you paste the UI snapshot. For CSS debugging — scalpel, not Swiss Army knife.

Runtime Snapshots is a series about what happens when you give LLMs the actual runtime state of a UI — not the HTML source, not a screenshot, not a description. Start from #1 or jump in here.

We covered the basic MCP cost argument back in September. This is the architectural explanation of why it happens.

Before your AI assistant reads a single line of your code, it has often already consumed 40,000–50,000 tokens.

That's not a bug in your setup. That's MCP working as designed.


What MCP Actually Loads

Model Context Protocol is a genuinely useful standard. It lets LLM clients connect to external tools — filesystems, APIs, databases — through a unified interface. But "unified" has a heavy tax.

When you connect a typical MCP server to your AI client (like Claude Desktop or Cursor), the protocol negotiates a session. During that negotiation, it sends the client every tool the server exposes:

  • names
  • descriptions
  • input schemas
  • output formats

All of it, upfront, for every new session.

A modest MCP setup — filesystem access, a browser tool, and a code search tool — generates a system prompt between 40,000 and 50,000 tokens before you type a single character. Larger configurations go higher.

Since most LLM APIs charge per token regardless of whether those tokens were "useful," you're paying this tax on every single conversation.

Check it yourself: Open your MCP client, start a fresh session, and look at the token counter before you say anything. The number is real.


The Surgical Alternative

E2LLM was built to solve a specific, painful problem: explaining runtime DOM state to an AI assistant.

Not the HTML source. Not the static structure. The live state — computed styles, visibility flags, ARIA roles, z-index stacks, responsive quirks.

The gap between what HTML says and what browser renders is where most annoying bugs live.

Standard workflow was brutal:

  • screenshot → 3 paragraphs description → hope AI understands
  • OR full page HTML → context window filled with nav/footer noise

E2LLM does one thing: click element → get structured JSON snapshot.

No server. No session. No overhead. Zero tokens until paste.

{
  "tag": "button",
  "text": "Submit",
  "computedStyles": {
    "display": "none",
    "visibility": "hidden",
    "opacity": "0"
  },
  "ariaRole": "button",
  "ariaDisabled": "true",
  "boundingRect": { "width": 0, "height": 0 }
}
Enter fullscreen mode Exit fullscreen mode

AI sees actual truth of element. Runtime reality.

Two Different Philosophies

MCP = Swiss Army knife

→ broad persistent access to environment

→ agentic workflows ("fix entire codebase")

→ upfront cost OK when agent roams freely

E2LLM = scalpel

→ one precise cut: exact runtime context

→ no persistent connection, no session

→ pay only for what you send

Mistake: using Swiss Army knife for surgery

Button not clickable? Don't need filesystem/git/DB access. Need pointer-events: none + z-index: -1 parent. 200 tokens, not 50k.

The Real Cost Comparison

Scenario Session overhead Query tokens Useful overhead
MCP (DOM debug) ~45k tokens 2k–10k ~0%
E2LLM snapshot 0 tokens 150–800 ~100%

$3/M tokens → 20 DOM sessions/day MCP = $2.70 overhead daily.

Every day. Tokens that don't solve bugs.

When to Use Each

Use MCP when:

  • agentic workflow (multi-file/API/systems)
  • dynamic tool discovery needed
  • building automation pipelines

Use E2LLM when:

  • debugging specific UI/CSS issue
  • showing computed element state
  • precise snapshot, no context burn

The Broader Point

MCP standardized agent-system connections. Important.

But standardization ≠ optimization.

Current MCP default: "load everything, always."

This = money + latency + distraction.

Scalpel doesn't replace Swiss Army knife.

For surgery, use scalpel.


Previous: #12 — Reflection in the Code

E2LLM — free Chrome/Firefox extension. Local only.

GitHub | Chrome

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.