Token cost in browser automation MCPs has become a real topic — articles like "Playwright MCP Burns 114K Tokens Per Test" have been making the rounds. Tools are approaching this from different angles: Playwright MCP's --output-mode file option saves snapshots to disk instead of returning them in LLM context, Vercel's agent-browser compresses DOM state to a fraction of the original, and some tools add vision-based fallbacks for layout understanding.
I've been working on WebClaw, an open-source Chrome extension-based browser MCP. It takes the accessibility tree approach like Playwright MCP, but with a more compact format. I wanted to measure the actual difference — not guess, but measure — so I set up a side-by-side test.
How I Measured
Versions tested:
- Playwright MCP:
@playwright/mcpv0.0.68 (npx @playwright/mcp@0.0.68 --headless) - WebClaw:
webclaw-mcpv0.9.0 + Chrome extension v0.9.0 - Measured: February 26, 2026
I registered both Playwright MCP and WebClaw as MCP servers in the same Claude Code session, then ran the same steps on each:
- Navigate to the target URL
- Call the snapshot tool (
browser_snapshot/page_snapshot) - Measure the full response text length in characters
- Estimate tokens as
characters / 4(approximation — actual tokenization varies by model)
Both tools return the complete accessibility tree with no truncation. WebClaw's default is unlimited output (no token budget), so this is a pure format efficiency comparison.
I picked three pages with different content patterns:
- Wikipedia — long article with many reference links and navigation templates
- GitHub — repository page with file listing, README, and sidebar
- Hacker News — list-style page with 30 items
Important caveat on fairness: Playwright MCP runs a headless Chromium (not logged in). WebClaw runs in the user's Chrome (logged in to GitHub in my case). This means WebClaw sees more UI on GitHub — authenticated menus, notifications, repo actions — which actually increases its output. The comparison is biased against WebClaw on that page.
Results: Format Efficiency
Both tools returning full, untruncated accessibility trees:
| Site | Playwright MCP | WebClaw | Difference |
|---|---|---|---|
| Wikipedia (MCP article) | 16,044 tokens (64,176 chars) | 7,860 tokens (31,439 chars) | 51% smaller |
| GitHub (anthropics/claude-cookbooks) | 19,409 tokens (77,637 chars) | 4,304 tokens (17,215 chars) | 78% smaller |
| Hacker News (front page) | 14,547 tokens (58,189 chars) | 3,052 tokens (12,207 chars) | 79% smaller |
The range is 51% to 79% depending on the page. Let me dig into why.
What Creates the Difference
Comparing the actual output for the same Wikipedia page:
Playwright MCP (browser_snapshot):
- generic [active] [ref=e1]:
- link "Jump to content" [ref=e2] [cursor=pointer]:
- /url: "#bodyContent"
- banner [ref=e4]:
- navigation "Site" [ref=e6]:
- generic "Main menu" [ref=e7]:
- button "Main menu" [ref=e8] [cursor=pointer]
WebClaw (page_snapshot):
[page "Model Context Protocol - Wikipedia"]
[banner]
[nav "Site"]
[@e2 link]
[search]
[@e3 searchbox "Search Wikipedia"]
[@e4 button "Search"]
The difference comes down to design choices — each reasonable on its own, but they compound:
| Design choice | Playwright MCP | WebClaw |
|---|---|---|
| Which elements get refs | All elements (generic, rowgroup, cell...) |
Only interactive elements (buttons, links, inputs) |
| Attribute output |
[active], [cursor=pointer], /url: on all applicable |
Minimal — only what's needed for action |
| Table representation | Full nested structure per cell | Compressed single-line rows |
| Ref count (GitHub) | 789 refs | 245 refs |
Playwright MCP's approach — labeling every element with a ref — gives maximum flexibility for targeting any element. WebClaw trades that completeness for compactness by only labeling things the AI can actually interact with.
Why the range is so wide (51% to 79%)
The format savings vary by page structure:
-
GitHub (78%): The file listing table is where the biggest difference shows. Playwright MCP assigns refs to every
row,cell,genericwrapper (789 total). WebClaw only labels links and buttons (245 total). Additionally, WebClaw follows the W3C Accessible Name specification, usingtextContentbefore thetitleattribute for buttons and links. On GitHub, many buttons have short display text ("X") but verbose title attributes ("Close dialog") — using the spec-compliant order avoids the bloat. -
Hacker News (79%): Simple, repetitive table structure. WebClaw's table compression (
[row] 1. | link | link) eliminates most of the verbosity. Playwright MCP outputs nestedrowgroup > row > cell > generic > linkfor each of the 30 items. - Wikipedia (51%): The article body has many inline links that both tools represent similarly. The savings come primarily from the navigation templates (Generative AI, Artificial Intelligence navboxes) where structural compression helps, but the text content itself is irreducible.
Controlling Output Size
WebClaw defaults to unlimited output — no truncation. But when you need to manage token costs, two options are available:
Interactive elements only — interactiveOnly
{ "interactiveOnly": true }
Strips all text content. A 2,000-line page becomes ~200 lines of buttons, links, and inputs.
Landmark region focus — focusRegion
{ "focusRegion": "main" }
Only returns the main, nav, header, or footer section. Useful when you know where the content you need is.
Playwright MCP doesn't have equivalents — it always returns the full tree.
The Broader Landscape
This comparison only covers in-context accessibility trees. The ecosystem is moving fast, and there are other approaches worth knowing about:
-
Playwright MCP file output (
--output-mode file): Saves snapshots to disk files instead of returning them in LLM context. Clients that support file references can read these without consuming context tokens. A fundamentally different approach to the same problem. - DOM compression tools (Vercel's agent-browser, browser-use, etc.): These extract and compress DOM/accessibility tree state, filtering down thousands of nodes to the most relevant elements. Some also support optional vision models for layout understanding as a secondary input.
WebClaw's approach is narrower: same accessibility tree method as Playwright MCP's browser_snapshot, but with a more compact format. The numbers above show what format choices alone can do — but they don't capture the full picture of what's possible with file-based or DOM compression approaches.
Why Format Efficiency Still Matters
Even with file-based alternatives emerging, in-context snapshots remain the default for most MCP setups. A browser automation task rarely reads a page just once — navigate, read, click, read again, fill a form, check the result — that's easily 5-10 snapshot calls. A 51-79% format reduction compounds across those calls.
Tradeoffs
I'm biased — I built WebClaw — so let me be upfront about the tradeoffs.
Where Playwright MCP is the better choice:
- CI/headless environments (WebClaw needs a visible Chrome window)
- Cross-browser testing (Chromium, Firefox, WebKit)
- Zero-install setup (
npxone-liner vs. Chrome extension) - Complete output — every element gets a ref, nothing is omitted
-
--output-mode filefor file-based snapshots
Where WebClaw fits better:
- Token-sensitive workflows where format compactness matters
- Logged-in sessions (runs in your existing Chrome — no re-authentication)
- Bot-resistant sites (Chrome extension, no WebDriver flags)
- When you need output size controls (
interactiveOnly,focusRegion)
WebClaw limitations:
- Requires Chrome + extension install
- No headless mode
- No test code generation
- Uses your real session (the AI operates with your credentials)
Setup
Claude Code:
claude mcp add webclaw -- npx -y webclaw-mcp
Claude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"webclaw": {
"command": "npx",
"args": ["-y", "webclaw-mcp"]
}
}
}
Then install the Chrome extension: extract the zip, go to chrome://extensions/, enable Developer mode, and load the dist/ folder.
Wrapping Up
The takeaway isn't "use WebClaw instead of Playwright MCP" — it's that accessibility tree format choices matter more than you'd expect. Assigning refs to every element vs. only interactive ones, including [cursor=pointer] hints vs. omitting them, following the W3C accessible name spec vs. using title attributes — these small decisions compound into a 51-79% difference on real pages.
The browser MCP space is evolving quickly. File-based snapshots, DOM compression tools, and hybrid approaches are all worth watching. If you're hitting token limits with your current setup, the data here might help you understand why — and what to try next.
If you want to reproduce these measurements or try WebClaw, the repo is open. Issues and feedback welcome — this is a solo project and I'm still figuring out the right tradeoffs.
GitHub: github.com/kuroko1t/webclaw
npm: npx -y webclaw-mcp
WebClaw is MIT-licensed open source.
Top comments (0)