DEV Community

Leon
Leon

Posted on • Originally published at taprun.dev

Playwright MCP burns 114k tokens for one workflow. Here's why, and what to do about it.

A recent r/ClaudeAI post measured a single Playwright MCP workflow at 114,000 tokens. Not a complex task — a 7-step navigation + form submission that ran in under a minute. Same workflow as a compiled tap.run: zero tokens.

This isn't "Playwright MCP is bad." It's a structural property of running an LLM at runtime versus compile time.

Where the tokens go

Each Playwright MCP call sends back to the model:

  • The current page's accessibility tree (~5-15K tokens for a typical SPA)
  • A screenshot encoded as base64 (~2-8K tokens depending on quality)
  • The console output since last call
  • The action result + any error context

The model needs all of that to decide the next action. So a 7-step workflow = 7 × (~15K) = ~100K tokens. Add the schema injection at session start (~1.3K per tool, ~28 tools loaded eagerly = ~36K) and you're at the 114K observed.

The optimisations help — DOM compression, accessibility-only modes, smaller screenshots — but the per-step cost is still proportional to page complexity. Add interaction depth and the cost goes up linearly.

The compiler alternative

The insight tap forge is built on: most browser automation is a known workflow. You're not exploring; you're executing the same task on the same site, repeatedly. The LLM is needed to figure out how the first time. After that, it's overhead.

# First time — LLM authors the program
$ tap forge https://example.com/login → submit
✓ Inspected: form#login, 3 fields
✓ Verified: redirect to /dashboard, status 200
✓ Saved: example/login.tap.js   (47 lines of JavaScript)

# Forever after — no LLM, no tokens
$ tap example login user=alice pass=xxx   # 200ms, $0.00
$ tap example login user=alice pass=xxx   # 200ms, $0.00
Enter fullscreen mode Exit fullscreen mode

For the workflow that cost 114K tokens with Playwright MCP, the equivalent .tap.js file is ~80 lines. It runs in 200ms. Token cost: 0 (after the one-time forge).

When each makes sense

Playwright MCP wins when:

  • The workflow is unique each time (agent exploration)
  • The site changes structure between runs (no stable program possible)
  • You're prototyping and don't yet know what you want to extract

Compiled taps win when:

  • You run the same workflow more than ~5 times
  • The site's structural pattern is stable (~95% of sites — most A/B tests don't change DOM, just CSS)
  • You need monitoring (deterministic output = row count is a signal)
  • You need offline execution

The break-even is low. Even at $5/MTok, one Playwright MCP run of 100K tokens = $0.50. Five runs = $2.50, more than the entire $9/mo Hacker tier of a compiler-based tool.

Two structural differences worth understanding

1. Output consistency. When the same site/same prompt produces slightly different extractions across runs (the LLM is non-deterministic), monitoring is structurally hard. Row count fluctuation isn't noise — it's the model. With a compiled tap, row count fluctuation IS noise, and you can alert on it.

2. Failure detection. Playwright MCP detects failure reactively — the tool call returns an error, the LLM sees it, retries with a different approach. By the time you notice, tokens are spent and time is lost. Compiled taps detect failure proactively via fingerprint diffing — tap doctor checks if the page structure changed BEFORE the run fires. If drifted, the run doesn't even start.

The benchmark question

Honest comparison: Playwright MCP has been the most flexible browser-agent setup for the past year. The 114K tokens is the price you pay for that flexibility. If your workflow is varied enough that you need it, pay it. If your workflow is the same automation run 1,000 times, paying it is leaving money on the table.

The broader pattern: every browser-agent tool faces this LLM-at-runtime vs LLM-at-compile-time tradeoff. The question isn't "which tool is better" — it's "does my workload repeat enough to amortize a one-time compile?"

For most production scrapers, the answer is yes.

Top comments (0)