DEV Community

Cover image for How Accessibility Tree Formatting Affects Token Cost in Browser MCPs
kuroko
kuroko

Posted on

How Accessibility Tree Formatting Affects Token Cost in Browser MCPs

Token cost in browser automation MCPs has become a real topic — articles like "Playwright MCP Burns 114K Tokens Per Test" have been making the rounds. Tools are approaching this from different angles: Playwright MCP's --output-mode file option saves snapshots to disk instead of returning them in LLM context, Vercel's agent-browser compresses DOM state to a fraction of the original, and some tools add vision-based fallbacks for layout understanding.

I've been working on WebClaw, an open-source Chrome extension-based browser MCP. It takes the accessibility tree approach like Playwright MCP, but with a more compact format. I wanted to measure the actual difference — not guess, but measure — so I set up a side-by-side test.

How I Measured

Versions tested:

  • Playwright MCP: @playwright/mcp v0.0.68 (npx @playwright/mcp@0.0.68 --headless)
  • WebClaw: webclaw-mcp v0.9.0 + Chrome extension v0.9.0
  • Measured: February 26, 2026

I registered both Playwright MCP and WebClaw as MCP servers in the same Claude Code session, then ran the same steps on each:

  1. Navigate to the target URL
  2. Call the snapshot tool (browser_snapshot / page_snapshot)
  3. Measure the full response text length in characters
  4. Estimate tokens as characters / 4 (approximation — actual tokenization varies by model)

Both tools return the complete accessibility tree with no truncation. WebClaw's default is unlimited output (no token budget), so this is a pure format efficiency comparison.

I picked three pages with different content patterns:

  • Wikipedia — long article with many reference links and navigation templates
  • GitHub — repository page with file listing, README, and sidebar
  • Hacker News — list-style page with 30 items

Important caveat on fairness: Playwright MCP runs a headless Chromium (not logged in). WebClaw runs in the user's Chrome (logged in to GitHub in my case). This means WebClaw sees more UI on GitHub — authenticated menus, notifications, repo actions — which actually increases its output. The comparison is biased against WebClaw on that page.

Results: Format Efficiency

Both tools returning full, untruncated accessibility trees:

Site Playwright MCP WebClaw Difference
Wikipedia (MCP article) 16,044 tokens (64,176 chars) 7,860 tokens (31,439 chars) 51% smaller
GitHub (anthropics/claude-cookbooks) 19,409 tokens (77,637 chars) 4,304 tokens (17,215 chars) 78% smaller
Hacker News (front page) 14,547 tokens (58,189 chars) 3,052 tokens (12,207 chars) 79% smaller

The range is 51% to 79% depending on the page. Let me dig into why.

What Creates the Difference

Comparing the actual output for the same Wikipedia page:

Playwright MCP (browser_snapshot):

- generic [active] [ref=e1]:
  - link "Jump to content" [ref=e2] [cursor=pointer]:
    - /url: "#bodyContent"
  - banner [ref=e4]:
    - navigation "Site" [ref=e6]:
      - generic "Main menu" [ref=e7]:
        - button "Main menu" [ref=e8] [cursor=pointer]
Enter fullscreen mode Exit fullscreen mode

WebClaw (page_snapshot):

[page "Model Context Protocol - Wikipedia"]
 [banner]
  [nav "Site"]
  [@e2 link]
 [search]
  [@e3 searchbox "Search Wikipedia"]
  [@e4 button "Search"]
Enter fullscreen mode Exit fullscreen mode

The difference comes down to design choices — each reasonable on its own, but they compound:

Design choice Playwright MCP WebClaw
Which elements get refs All elements (generic, rowgroup, cell...) Only interactive elements (buttons, links, inputs)
Attribute output [active], [cursor=pointer], /url: on all applicable Minimal — only what's needed for action
Table representation Full nested structure per cell Compressed single-line rows
Ref count (GitHub) 789 refs 245 refs

Playwright MCP's approach — labeling every element with a ref — gives maximum flexibility for targeting any element. WebClaw trades that completeness for compactness by only labeling things the AI can actually interact with.

Why the range is so wide (51% to 79%)

The format savings vary by page structure:

  • GitHub (78%): The file listing table is where the biggest difference shows. Playwright MCP assigns refs to every row, cell, generic wrapper (789 total). WebClaw only labels links and buttons (245 total). Additionally, WebClaw follows the W3C Accessible Name specification, using textContent before the title attribute for buttons and links. On GitHub, many buttons have short display text ("X") but verbose title attributes ("Close dialog") — using the spec-compliant order avoids the bloat.
  • Hacker News (79%): Simple, repetitive table structure. WebClaw's table compression ([row] 1. | link | link) eliminates most of the verbosity. Playwright MCP outputs nested rowgroup > row > cell > generic > link for each of the 30 items.
  • Wikipedia (51%): The article body has many inline links that both tools represent similarly. The savings come primarily from the navigation templates (Generative AI, Artificial Intelligence navboxes) where structural compression helps, but the text content itself is irreducible.

Controlling Output Size

WebClaw defaults to unlimited output — no truncation. But when you need to manage token costs, two options are available:

Interactive elements onlyinteractiveOnly

{ "interactiveOnly": true }
Enter fullscreen mode Exit fullscreen mode

Strips all text content. A 2,000-line page becomes ~200 lines of buttons, links, and inputs.

Landmark region focusfocusRegion

{ "focusRegion": "main" }
Enter fullscreen mode Exit fullscreen mode

Only returns the main, nav, header, or footer section. Useful when you know where the content you need is.

Playwright MCP doesn't have equivalents — it always returns the full tree.

The Broader Landscape

This comparison only covers in-context accessibility trees. The ecosystem is moving fast, and there are other approaches worth knowing about:

  • Playwright MCP file output (--output-mode file): Saves snapshots to disk files instead of returning them in LLM context. Clients that support file references can read these without consuming context tokens. A fundamentally different approach to the same problem.
  • DOM compression tools (Vercel's agent-browser, browser-use, etc.): These extract and compress DOM/accessibility tree state, filtering down thousands of nodes to the most relevant elements. Some also support optional vision models for layout understanding as a secondary input.

WebClaw's approach is narrower: same accessibility tree method as Playwright MCP's browser_snapshot, but with a more compact format. The numbers above show what format choices alone can do — but they don't capture the full picture of what's possible with file-based or DOM compression approaches.

Why Format Efficiency Still Matters

Even with file-based alternatives emerging, in-context snapshots remain the default for most MCP setups. A browser automation task rarely reads a page just once — navigate, read, click, read again, fill a form, check the result — that's easily 5-10 snapshot calls. A 51-79% format reduction compounds across those calls.

Tradeoffs

I'm biased — I built WebClaw — so let me be upfront about the tradeoffs.

Where Playwright MCP is the better choice:

  • CI/headless environments (WebClaw needs a visible Chrome window)
  • Cross-browser testing (Chromium, Firefox, WebKit)
  • Zero-install setup (npx one-liner vs. Chrome extension)
  • Complete output — every element gets a ref, nothing is omitted
  • --output-mode file for file-based snapshots

Where WebClaw fits better:

  • Token-sensitive workflows where format compactness matters
  • Logged-in sessions (runs in your existing Chrome — no re-authentication)
  • Bot-resistant sites (Chrome extension, no WebDriver flags)
  • When you need output size controls (interactiveOnly, focusRegion)

WebClaw limitations:

  • Requires Chrome + extension install
  • No headless mode
  • No test code generation
  • Uses your real session (the AI operates with your credentials)

Setup

Claude Code:

claude mcp add webclaw -- npx -y webclaw-mcp
Enter fullscreen mode Exit fullscreen mode

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "webclaw": {
      "command": "npx",
      "args": ["-y", "webclaw-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Then install the Chrome extension: extract the zip, go to chrome://extensions/, enable Developer mode, and load the dist/ folder.

Wrapping Up

The takeaway isn't "use WebClaw instead of Playwright MCP" — it's that accessibility tree format choices matter more than you'd expect. Assigning refs to every element vs. only interactive ones, including [cursor=pointer] hints vs. omitting them, following the W3C accessible name spec vs. using title attributes — these small decisions compound into a 51-79% difference on real pages.

The browser MCP space is evolving quickly. File-based snapshots, DOM compression tools, and hybrid approaches are all worth watching. If you're hitting token limits with your current setup, the data here might help you understand why — and what to try next.

If you want to reproduce these measurements or try WebClaw, the repo is open. Issues and feedback welcome — this is a solo project and I'm still figuring out the right tradeoffs.

GitHub: github.com/kuroko1t/webclaw
npm: npx -y webclaw-mcp


WebClaw is MIT-licensed open source.

Top comments (0)