Your AI Doesn't Need Screenshots. It Needs DevTools.
Most AI coding agents are still debugging web apps in the dumbest possible way.
They ask for a screenshot.
Meanwhile the real answer is usually sitting in the browser console, the Network tab, the request payload, or the response body.
That is not really an AI problem. It is a tooling problem.
I got tired of watching agents guess from screenshots while I had DevTools open right next to them, showing the exact reason something failed. So I built mare-browser-mcp, a browser MCP designed less like "remote control for a webpage" and more like "give the model the same debugging signals I actually use."
That changed the loop from this:
AI writes code -> I test it -> I describe the bug -> AI guesses a fix -> repeat
to this:
AI writes code -> AI tests it -> AI reads the failure -> AI fixes it
That difference matters more than people think.
The Screenshot Trap
Most browser MCPs start from the same assumption: if the model can see the page, it can debug the app.
That sounds reasonable until you try to debug anything real.
A screenshot does not tell the model:
- that login failed with a
401 - that the frontend sent
user_emailinstead ofemail - that a button click triggered a JS exception
- that the UI is fine but the API returned invalid JSON
- that the missing data is actually below the fold in a scrollable grid
- that the context menu only appears on right-click
A screenshot is expensive, unstructured, and often the least useful artifact in the whole debugging flow.
Language models are good at reasoning over structured text. We keep handing them pixels and asking them to guess.
What I Wanted Instead
I wanted a browser MCP that makes the model think more like a developer with DevTools open.
That means a few things:
- It should read console logs and page errors directly
- It should see network requests, payloads, response bodies, and timing
- It should batch actions instead of making one tiny tool call per click
- It should query the DOM as structured data
- It should handle hover, drag, right-click, and nested scroll containers
- It should only use screenshots when the problem is actually visual
That is the core idea behind mare-browser-mcp.
It is a lean MCP server built with Playwright and the MCP SDK, and the whole design is biased toward structured debugging data first.
The One Tool I Wish More Browser MCPs Had
The most important tool in the server is browser_debug.
One call returns:
- current URL
- page title
- console logs
- page errors
- dialogs
- network request history
And the network history is the useful part. The server captures things like:
- method
- URL
- parsed query params
- request headers
- request body
- status code
- JSON response body
- duration in milliseconds
So instead of the model seeing "submit button clicked and page still looks wrong," it can see:
{
"method": "POST",
"url": "https://app.example.com/api/session",
"requestBody": {
"user_email": "user@example.com",
"password": "secret"
},
"status": 401,
"responseBody": {
"error": "missing field: email"
},
"duration_ms": 23
}
That is a completely different debugging experience.
At that point the model is not guessing anymore. It can actually reason from evidence.
There are a few guardrails in the implementation too:
- static assets are skipped
-
OPTIONSpreflights are ignored - bodies are capped at 4 KB
- auth headers are masked
- cookies are reduced to
"[present]" - failed requests are logged with the failure reason
So the data stays useful without turning into junk.
Batching Matters More Than It Sounds
Another thing that slows agents down is one-action-per-call design.
Click. Wait. Screenshot. Fill. Wait. Screenshot.
That is not just annoying. It makes the whole debugging loop slower and dumber.
browser_act fixes that by accepting an ordered list of actions in one call:
[
{ "action": "fill", "selector": "#email", "value": "user@example.com" },
{ "action": "fill", "selector": "#password", "value": "secret" },
{ "action": "click", "selector": "button[type=submit]" }
]
The current action set covers a lot of what real apps actually need:
clickhoverdragclicklinkfillselectkeypresswaitforscrolltowaitclearconsole
That means the model is not stuck the moment the UI uses hover states, resize handles, sortable grids, or context menus.
And yes, right-click works through click with button: "right", which turns out to matter more often than you might expect.
Structured DOM Queries Beat Giant Dumps
Another thing I did not want was a browser tool that just vomits the whole DOM back at the model.
browser_query is useful because it lets the model ask narrower questions:
- How many rows are there?
- Is this button visible?
- What text do these badges have?
- Is this input disabled?
Examples:
{ "selector": ".ag-row", "count_only": true }
{ "selector": "button", "all": true, "visible_only": true, "limit": 10 }
{ "selector": ".badge", "fields": ["text", "className", "visible"] }
The supported fields are deliberately practical:
textvaluevisibledisabledclassNamehrefinnerHTML
That keeps the output small enough to reason about and avoids wasting context on useless noise.
Screenshots Are Still There. They Just Aren't First.
I did keep browser_screenshot.
But I made the tool description explicitly say it should be the last resort, not the default.
That sounds minor, but it really changes how an agent behaves. If the tool surface itself tells the model:
- use
browser_debugfirst - use
browser_querysecond - use screenshots only for visual problems
then the model starts spending its attention on data instead of appearance.
That is the behavior I wanted.
Real Apps Need More Than Page Scroll
One of the easiest ways browser tools fall apart is scrolling.
Scrolling the page is not enough for modern apps. A lot of important UI is inside:
- data grid viewports
- chat panels
- sidebars
- overflow containers
- modals
browser_scroll can scroll the page, scroll an element into view, or scroll inside a specific container.
Example:
{
"container": ".ag-body-viewport",
"direction": "down",
"pixels": 500
}
And because it returns scroll position data, the model can tell where it is instead of just flailing around.
There Is Also an Escape Hatch
Even with all of that, fixed tools never cover every case.
So there is browser_eval, which runs JavaScript in the page context.
That is useful for things like:
- reading computed styles
- inspecting app state on
window - appending input text without clearing it
- simulating more custom interactions
- calling
fetch()directly - checking CSS visibility details
I do not think the escape hatch should be the first thing the model reaches for. But it absolutely should exist.
What This Looks Like in Practice
A realistic flow is:
1. browser_navigate({ url: "https://myapp.com", clear_logs: true })
2. browser_act({
commands: [
{ action: "fill", selector: "#email", value: "user@example.com" },
{ action: "fill", selector: "#password", value: "secret" },
{ action: "click", selector: "button[type=submit]" }
]
})
3. browser_wait_for_network({ url_pattern: "/api/session", method: "POST" })
4. browser_debug({ console_types: ["error"], last_n: 20 })
From there the model can see:
- whether the request fired
- what payload got sent
- what response came back
- whether the browser threw an error
- whether the failure is frontend, backend, or both
That is the kind of loop where an agent becomes genuinely useful.
Why I Think This Matters
I think a lot of people are measuring browser MCPs by the wrong thing.
The question is not "can the AI click buttons?"
The real question is: when something breaks, can the AI see the same evidence a good developer would look at?
If the answer is yes, the quality of debugging jumps fast.
If the answer is no, you end up with a very expensive screenshot viewer.
That is why I built mare-browser-mcp the way I did. Not to make agents better at looking at pages, but to make them better at understanding what happened in the browser.
Not screenshots first. Telemetry first.
Try It
git clone https://github.com/emadklenka/mare_browser_mcp
cd mare_browser_mcp
pnpm install
npx playwright install chromium
pnpm run setup
It also supports global install:
pnpm add -g mare-browser-mcp
npx playwright install chromium
The repo has setup instructions for Claude Code and OpenCode, and the current server exposes:
browser_navigatebrowser_actbrowser_debugbrowser_querybrowser_screenshotbrowser_evalbrowser_scrollbrowser_restartbrowser_uploadbrowser_wait_for_network
If you are building web apps with AI, that combination is much closer to giving your agent DevTools than giving it a screenshot and hoping for the best.
Top comments (0)