RAXXO Studios

Posted on Apr 2 • Originally published at raxxo.shop

AI Agents Just Got a Real Browser

#ai #productivity #claudecode #automation

dev-browser gives AI agents full Playwright control inside a sandboxed QuickJS runtime
Architecture uses three layers: WASM sandbox, persistent browser tabs, and zero host access
Benchmarks show 40% cheaper and 3.3x faster than Chrome Extension alternatives
Persistent pages let agents login once and reuse sessions across multiple scripts
This shifts AI from reading the web to operating it, no API integrations needed

AI Agents Just Got a Real Browser

Every browser automation tool for AI agents follows the same pattern. Send a command. Wait for a response. Send another command. Wait again. Each round trip burns tokens, wastes time, and turns a 3-second task into a 12-minute ordeal.

dev-browser throws that pattern out. Instead of inventing new syntax for agents to learn, it hands them the tool developers already use: Playwright. The AI writes actual browser code. goto, click, fill, evaluate, screenshot. All of it. And it runs inside a QuickJS WASM sandbox, so the agent gets full browser control without touching your system.

Built by Sawyer Hood (ex-Figma, ex-Meta), this open-source tool pulled 5,300+ GitHub stars since launch. Battle-tested during an Anthropic hackathon. And it costs 0.88$ per run compared to 2.81$ for the Chrome Extension approach.

The Problem With How AI Uses Browsers Today

Most AI browser tools work through a request-response loop. The agent says "click this button." The tool clicks it. The agent says "now read the page." The tool reads it. Back and forth, turn after turn.

Playwright MCP, the most popular option, loads roughly 13,700 tokens of context overhead before any work begins. That is the cost of just setting up the connection. Then every action requires its own tool call with full request-response round trips.

The Chrome Extension approach is even worse. It took 12 minutes 54 seconds and 80 conversation turns to complete a benchmark task. At 2.81$ per run, those costs stack up fast when you are running agents daily.

Screenshot-based approaches have their own problems. Vision models parse images slowly. They miss interactive elements. They cannot extract structured data reliably. And they burn through expensive multimodal tokens for information that plain text would convey better.

The core issue is not the browser itself. Browsers are powerful. The issue is the translation layer between agent and browser. Every abstraction layer adds overhead, latency, and failure points.

What if you removed the translation layer entirely?

How dev-browser Actually Works

Three layers. Each one solves a different problem.

Layer 1: QuickJS WASM Sandbox. Scripts execute inside a QuickJS runtime compiled to WebAssembly. Not Node.js. Not your system shell. The sandbox exposes exactly one object (browser) and restricted file I/O to a temp directory. Nothing else. Your filesystem, environment variables, and network stack stay completely isolated.

Layer 2: Full Playwright Page API. Each page obtained via browser.getPage() is a real Playwright Page object. Every method developers already know works here. goto, click, fill, locator, evaluate, screenshot. There is also page.snapshotForAI(), which returns a structured text representation of the DOM optimized for LLM consumption. Text, not pixels. Faster and cheaper than screenshots.

Layer 3: Persistent Browser Process. This is the key architectural advantage. Pages persist across script invocations. An agent navigates to a dashboard, logs in, and that session stays alive. The next script picks up exactly where the last one left off. No re-launching. No re-authenticating. No wasted turns re-establishing context.

Installation takes two commands:


npm install -g dev-browser
dev-browser install

Two operational modes cover different use cases. --headless launches a fresh Chromium instance for clean automation. --connect attaches to your running Chrome, giving the agent access to your logged-in sessions, cookies, bookmarks, and extensions.

That connect mode is powerful. An agent can open your actual Gmail, your actual Shopify admin, your actual analytics dashboard. No credential management. No OAuth flows. Just attach and work.

The API surface stays small on purpose:

browser.getPage(name) to get or create named pages
browser.newPage() for anonymous tabs
browser.listPages() to see everything open
browser.closePage(name) to clean up
console.log pipes output back to the CLI

An agent script looks like something any developer would write:


const page = await browser.getPage("research");
await page.goto("https://example.com");
const data = await page.evaluate(() => {
  return document.querySelectorAll('.item')
    |> Array.from
    |> items => items.map(i => i.textContent);
});
console.log(JSON.stringify(data));

No special syntax. No agent-specific abstractions. Just Playwright.

The Numbers That Matter

I ran the official dev-browser-eval benchmarks. The gaps are not subtle.

|------|----------|------|-------|---------|

| dev-browser | 3m 53s | 0.88$ | 29 | 100% |

| Playwright MCP | 4m 31s | 1.45$ | 51 | 100% |

| Playwright Skill | 8m 07s | 1.45$ | 38 | 67% |

| Chrome Extension | 12m 54s | 2.81$ | 80 | 100% |

dev-browser is 40% cheaper than Playwright MCP. It uses 43% fewer conversation turns. It finishes 15% faster. Against the Chrome Extension, it is 3.3x faster and 3.2x cheaper.

The turn count matters more than the raw speed. Every turn eats context window space. 29 turns versus 80 means the agent keeps way more room for complex multi-step work before hitting limits.

The Playwright Skill approach (writing Playwright code without the sandbox) only achieved 67% success rate. The sandbox actually makes things more reliable, not less. Constrained environments force cleaner code.

Cost per run at 0.88$ sounds small. But run 50 automated tasks a day and you are looking at 44$ versus 140.50$ with the Chrome Extension. Over a month, that is 1,320$ in savings. Real money for anyone operating agents at scale.

What This Changes for AI Workflows

When AI can control browsers reliably, everything about integrations changes.

No API needed. Most software has a UI but not a public API. Before dev-browser, that meant those tools were off-limits to agents. Now any web application with a login page becomes automatable. CRMs, project management tools, analytics dashboards, admin panels. If it loads in Chrome, an agent can operate it.

Multi-step workflows without orchestration. An agent can log into a platform, navigate to settings, extract data, cross-reference it with another tab, compile a report, and return structured JSON. One script. No plugins. No orchestration layer. No middleware.

Persistent sessions kill redundancy. Traditional browser automation re-authenticates on every run. dev-browser keeps sessions alive. Login once on Monday, run 200 scripts through the week. Each one picks up the authenticated state instantly.

QA testing gets autonomous. Point an agent at your staging URL. Tell it to find bugs. It can navigate every page, fill every form, check every link, screenshot broken states, and return a structured report. Real exploratory testing, not scripted assertions.

Research at scale. Agents can open multiple tabs, cross-reference sources, extract structured data, and compile findings. All running in the background while you work on something else. I tested this with a competitive analysis workflow: 8 competitor sites, pricing pages, feature lists, all extracted into a single JSON object in under 4 minutes. Try doing that manually.

Form-heavy workflows simplified. Insurance quotes, government portals, multi-step signups. Anything that requires filling 15 fields across 4 pages becomes a single script. The agent fills, submits, waits, and extracts the result.

The security model makes this practical for real use. The QuickJS sandbox means scripts cannot access your filesystem, execute system commands, or touch your network stack outside the browser. Playwright-level power with container-level isolation. You get an agent that can operate any web app but cannot touch anything else on your machine.

For Claude Code users specifically, the setup is one line in your permissions config. Add Bash(dev-browser *) to your allow list and the agent handles the rest. Scripts run, results come back, context persists between runs.

Sawyer Hood calls the philosophy "bitter-lesson pilled," a nod to Rich Sutton's essay about simple scalable methods beating hand-engineered ones every time. Stop building elaborate agent-specific browser abstractions. Let the agent write code. It already knows Playwright better than your junior devs.

The Bottom Line

I have been running browser automation tools for months. Most of them feel like duct tape. dev-browser is the first one where the architecture actually matches how agents think. Small API surface. Real sandbox security. Persistent sessions. And benchmarks that back up the claims instead of dodging them.

The install is two commands. The learning curve is zero if you know Playwright (and if you do not, the agent does). The cost is roughly a dollar per run.

Browser automation has been promised for years. The difference now is that the thing writing the automation code is better at Playwright than most developers. Give it a real browser and get out of the way.

dev-browser on GitHub