Why Playwright MCP Cost Us 5 More Tokens Than We Expected

Harshavardhan Kangala Dayanand — Mon, 29 Jun 2026 17:59:36 +0000

We built an open-source browser automation MCP because we wanted something simple:

Observe a webpage once, let an LLM interact with it, then export a Playwright script that actually works.

That sounds straightforward.

It wasn't.

While benchmarking against Playwright MCP, we discovered something we hadn't considered: observation quality isn't the biggest contributor to cost. Iteration is.

The expensive part isn't making the browser agent click buttons.

It's making the generated automation reusable.

The hidden problem with `browser_snapshot`

Playwright MCP represents elements using ephemeral references:

button "Login" [ref=e5]
textbox "Email" [ref=e10]
link "Forgot password" [ref=e23]

Within an MCP session, this is excellent.

The model simply responds:

browser_click("e5")

Fast.

Clean.

Minimal reasoning.

The problem appears later.

Every snapshot generates a new set of references.

e5 today is not e5 tomorrow.

Those identifiers only exist for the lifetime of that observation.

Why this matters

Many people aren't using browser agents just to click around.

They're using them to produce reusable automation.

For example:

Playwright tests
CI pipelines
Documentation examples
Internal automation scripts

Once the LLM tries to generate code like this:

await page.getByRole(...);

the references become useless.

They're meaningless outside the MCP session.

The model now has to reconstruct selectors from scratch.

That usually means:

Reading another snapshot
Parsing the accessibility tree
Identifying the correct element
Guessing a locator
Debugging when it fails

The browser interaction itself is cheap.

The retries are not.

Measuring the Entire Workflow

Instead of measuring a single observation, we measured the full pipeline:

Observe
    ↓
Interact
    ↓
Generate Playwright Script
    ↓
Execute Script
    ↓
Fix Failures if Needed

The numbers were surprising.

Playwright MCP

Attempt 1

Observes page
Writes ref-based automation
Exported script cannot be reused

≈ 1,099 tokens

Attempt 2

Observes page again
Parses 62–93 accessibility nodes
Generates getByRole() selectors

≈ 1,171 tokens

Attempt 3

Fixes failed selectors

≈ 941 tokens

Total

3,211 tokens
≈ $0.04 per generated script

Brocogni

Instead of returning references, Brocogni computes selectors before the LLM ever sees the page.

The observation already contains:

Ranked selectors
Fallback selectors
Semantic purpose
Bounding boxes
Only actionable elements

The model simply copies them into the Playwright script.

One observation.

One generation.

Done.

1,535 tokens
≈ $0.01 per generated script

Across roughly 200 generated scripts/month:

Solution	Monthly Cost
Playwright MCP	~$7.11
Brocogni	~$1.33

That's roughly an 81% reduction in token cost in our benchmark.

More importantly, it's fewer failed iterations.

Signal Density Matters

Another observation was how much unnecessary information reaches the LLM.

A typical Playwright MCP snapshot contains:

62–93 accessibility nodes

Only around 9 are actually interactive.

The model must determine:

Which nodes matter
Which are actionable
Which selector should be generated

That reasoning consumes tokens.

Brocogni instead returns only actionable elements.

	Playwright MCP	Brocogni
Elements returned	62–93	9
Actionable	Mixed	9/9 (100%)
LLM filters nodes	Yes	No
Pre-computed selectors	No	Yes
Fallback selector chains	No	Yes

The observation is slightly richer.

The downstream reasoning becomes dramatically simpler.

The Architectural Difference

The distinction comes from where the work happens.

Playwright MCP exposes browser state.

The LLM performs much of the interpretation.

Brocogni shifts that interpretation server-side.

Chrome DevTools Protocol
          │
          ▼
Accessibility Tree
          │
          ▼
DOM Snapshot
          │
          ▼
Geometry Extraction
          │
          ▼
Actionability Filtering
          │
          ▼
Purpose Inference
          │
          ▼
Ranked Selector Generation
          │
          ▼
Structured JSON Observation
          │
          ▼
LLM

By the time the model receives the observation:

Selectors already exist
Fallback chains already exist
Only actionable elements remain

The LLM doesn't need to reverse-engineer the DOM.

This Isn't a Criticism of Playwright MCP

Playwright MCP is extremely well designed for interactive browser agents.

If your goal is simply:

"Navigate a website."

it's an excellent choice.

Our benchmark looked at a different workflow:

"Generate Playwright code that someone will commit to a repository."

Those are different optimization problems.

For the second case, reusable selectors matter more than ephemeral references.

Brocogni

Brocogni is an open-source MCP server focused on browser automation for LLMs.

Instead of exposing raw browser state, it provides structured observations that are immediately usable for automation generation.

Features

Ranked CSS, ARIA, XPath and text selectors
Fallback selector chains
Actionable element filtering
Semantic purpose inference
Bounding box information
MIT licensed
Zero telemetry
Fully local execution

GitHub

https://github.com/hrshx3o5o6/brocogni

Website

https://brocogni.vercel.app/

I'd love feedback from people building browser agents or using Playwright MCP in production.

If you've measured similar token costs—or found different tradeoffs—I'd be interested to compare approaches.

DEV Community: Harshavardhan Kangala Dayanand

Why Playwright MCP Cost Us 5 More Tokens Than We Expected

The hidden problem with browser_snapshot

Why this matters

Measuring the Entire Workflow

Playwright MCP

Attempt 1

Attempt 2

Attempt 3

Total

Brocogni

Signal Density Matters

The Architectural Difference

This Isn't a Criticism of Playwright MCP

Brocogni

Features

The hidden problem with `browser_snapshot`