Timothy Olaleke

Posted on Mar 19

Your Browser Has a Remote Control — And Nobody Told You

#webdev #ai #chrome #devtools

Your Browser Has a Remote Control — And Nobody Told You

3 tools that let AI agents drive Chrome. I tested all three. Here's what actually happens.

Every Chrome browser ships with a hidden feature that almost nobody talks about: a remote control API called the Chrome DevTools Protocol (CDP). It's the same protocol that powers Chrome DevTools — the thing you open when you press F12. But here's the part that changes everything: any program can use it. Including AI agents.

I've been using CDP daily for over a year to let AI agents browse the web using my real browser — with all my logged-in sessions intact. No passwords shared. No API keys. No OAuth flows. The AI just uses my browser like I would.

Three major tools have emerged to give AI agents this superpower. I tested all three on the same task, with the same browser, and discovered something that most tutorials and docs don't tell you.

The Three Contenders

	Playwright MCP	Chrome DevTools Protocol	agent-browser
Made by	Microsoft	Google (built into Chrome)	Vercel Labs
GitHub Stars	29,000+	Built-in (no repo needed)	23,500+
Language	TypeScript	Any (HTTP + WebSocket)	Rust
Latest Version	v0.0.68	Ships with Chrome	v0.21.2
Install	`npx @playwright/mcp`	Already in your browser	`npm i -g agent-browser`

They all use CDP under the hood. But they use it very differently — and that difference matters more than you'd think.

Setting It Up (60 Seconds)

Before we compare the tools, let's enable Chrome's remote control. It takes one command.

Start Chrome with remote debugging:

# macOS
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222

# Linux
google-chrome --remote-debugging-port=9222

# Windows
chrome.exe --remote-debugging-port=9222

Verify it's working:

curl http://localhost:9222/json/version

If you see a JSON response with your Chrome version, you're ready. That's the entire setup.

What just happened? You told Chrome to listen on port 9222 for remote control commands. Any program on your machine can now send instructions to your browser — open tabs, read pages, click buttons, fill forms, take screenshots.

Coming soon: Chrome 146+ is adding a native settings toggle for remote debugging — no command line needed. Go to DevTools (F12) → Settings → Experiments and search for "MCP". Once enabled, AI agents can connect without restarting Chrome. This is rolling out gradually in 2025/2026.

Seeing It in Action: agent-browser

Let's start with the most beginner-friendly tool. agent-browser by Vercel Labs gives you 108+ simple commands to control Chrome from your terminal.

Install it:

npm install -g agent-browser
agent-browser install  # Downloads a browser (first time only)

Fill a form in 4 commands:

Here's what happens at each step:

Step 1 — Open a page. Just like clicking a link, but from the command line.

Step 2 — Snapshot. This is the magic for AI agents. Instead of raw HTML, you get a clean list of interactive elements with reference IDs like [ref=e2]. An AI agent reads this and knows exactly what's on the page.

Step 3 — Interact. Use those ref IDs to fill fields, click buttons, check boxes. agent-browser fill @e2 "John Doe" fills the customer name field. Simple.

Step 4 — Screenshot. Take a picture of the result. With --annotate, every interactive element gets a numbered label — perfect for AI vision models.

Before and after:

The numbered red labels are --annotate mode. Each number maps to an element the AI can interact with. This is how vision-based AI agents understand web pages.

The Discovery: Session Sharing

Here's the thing nobody tells you. I tested all three tools against the same page — an authenticated dashboard behind a login. Same browser, same URL, same Chrome instance.

Two tools saw the full dashboard. One saw a login page.

What's happening?

When you connect via raw CDP or agent-browser --cdp 9222, you're using Chrome's default browser context. This means the AI agent sees exactly what you'd see — all your cookies, all your logged-in sessions, everything.

When you use Playwright MCP or agent-browser in standalone mode, they create an isolated browser context. Think of it like an incognito window. No cookies, no sessions, no logins. A clean slate.

The session sharing table:

Tool	Mode	Sees Your Logins?	Why
Raw CDP	Default context	Yes	Uses Chrome's real cookie jar
agent-browser	`--cdp 9222`	Yes	Connects to Chrome's default context
agent-browser	Standalone	No	Launches its own browser
Playwright MCP	Default	No	Creates an isolated browserContext

If you want your AI agent to use your existing logins — to read your email, check your dashboards, manage your accounts — you need raw CDP or agent-browser connected via --cdp.

If you want isolation — for testing, scraping, or running untrusted automations — Playwright MCP or agent-browser standalone gives you that by default.

Neither is "better." They're for different jobs. But most people don't know the difference exists.

How CDP Actually Works (The 2-Minute Version)

Chrome's remote control has three layers. That's it.

1. List open tabs (HTTP GET)

curl http://localhost:9222/json

Returns a JSON list of every tab with its title, URL, and WebSocket address.

2. Open a new tab (HTTP PUT — changed in Chrome 145+)

curl -X PUT "http://localhost:9222/json/new?https://example.com"

3. Send commands (WebSocket)

// Connect to a tab's WebSocket URL, then send:
{ "method": "Page.navigate", "params": { "url": "https://example.com" } }
{ "method": "Runtime.evaluate", "params": { "expression": "document.title" } }
{ "method": "Page.captureScreenshot" }

That's the entire protocol. HTTP for tab management, WebSocket for commands. You can read, click, type, screenshot, and intercept network requests on any page.

Head-to-Head: Features Compared

Playwright MCP (Microsoft)

Best for: AI agents that need structured browser automation with safety guarantees.

# Add to Claude Code, Cursor, or any MCP-compatible AI tool
npx @playwright/mcp@latest

What it gives you:

Snapshot mode — Returns an accessibility tree. AI agents reference elements by ref IDs instead of fragile CSS selectors
Console + network — Capture console logs and network requests
Form filling — Dedicated tools for clicks, fills, selects
Screenshots — PNG/JPEG with element-level targeting
Session isolation — Each connection gets its own clean context
Extension mode — --extension flag creates a bridge that CAN share sessions (shipped recently)

29,000+ stars. Isolated by default (intentional). Very active development.

agent-browser (Vercel Labs)

Best for: Fast, native CLI automation. AI agents that need speed and flexibility.

npm install -g agent-browser
agent-browser install  # Downloads Chrome for Testing (first time)

What it gives you:

108+ commands — open, click, fill, snapshot, screenshot, eval, get text, find role, mouse, network, har, and more
Annotated screenshots — --annotate labels interactive elements with numbered boxes for vision models
Daemon architecture — Browser persists between commands, chain with &&
CDP connection — --cdp 9222 connects to your real browser with all sessions
Session persistence — --session-name myapp auto-saves and restores browser state
Auto-connect — --auto-connect finds your running Chrome automatically
iOS Simulator support — Test on iPhone simulators via Appium
HAR recording — Capture full HTTP archive of all requests

# Chain commands — browser stays alive between them
agent-browser open example.com && \
agent-browser wait --load networkidle && \
agent-browser snapshot -i

Built in Rust. 78 releases in ~3 months. Created by Malte Ubl (Vercel CTO) and team. 23,500+ stars.

Raw CDP (DIY)

Best for: Maximum control, authenticated workflows, custom integrations.

No install needed — just talk to Chrome's HTTP/WebSocket API directly:

# Check what's running
curl http://localhost:9222/json/version

# List your real open tabs
curl http://localhost:9222/json

# Open a new tab (preserves all cookies and sessions)
curl -X PUT "http://localhost:9222/json/new?https://example.com"

No framework. No dependencies. Just HTTP requests and WebSocket messages. You can build a full browser automation tool in a few hundred lines of code. This is the lowest-level option — maximum power, maximum flexibility.

Using These Tools with AI Coding Agents

These tools really shine when connected to AI coding agents like Claude Code, Cursor, OpenCode, or Windsurf. Here's how:

With Claude Code (MCP)

// Add to .claude/mcp.json
{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

Now Claude Code can browse the web, fill forms, take screenshots, and read pages directly.

With agent-browser (CLI)

Any AI agent that can run shell commands can use agent-browser:

# AI agent runs these commands to research a topic
agent-browser open "en.wikipedia.org/wiki/Chrome_DevTools"
agent-browser snapshot -i      # Read the page content
agent-browser screenshot       # See what it looks like

With agent-browser --cdp (Authenticated)

# Connect to YOUR Chrome — AI reads pages as you
agent-browser --cdp 9222 snapshot -i
# Now the AI can see your dashboards, email, authenticated content

Connecting to Electron Apps

Since Electron apps (VS Code, Slack, Discord) are built on Chromium, you can control them too:

# Launch any Electron app with CDP debugging
"/Applications/Visual Studio Code.app/Contents/MacOS/Electron" --remote-debugging-port=9333

# Connect agent-browser to VS Code
agent-browser --cdp 9333 snapshot

The Landscape Is Moving Fast

This isn't a niche topic anymore. The browser automation for AI agents space is exploding:

browser-use (78,000 stars) — Originally built on Playwright, switched to raw CDP in 2025 for speed
chrome-devtools-mcp (30,000+ stars) — Google's Chrome team released an official MCP server for CDP
Chrome 146 — Google is adding a native settings toggle for AI agent access via MCP, built right into Chrome

The trend is clear: CDP is becoming the standard interface between AI agents and web browsers. Google endorses it. Microsoft builds on it. Vercel bets on it.

When to Use What

Use Playwright MCP when:

You need test isolation — each run starts clean
You're building automated testing pipelines
You want structured accessibility snapshots for AI
Security matters — you don't want the AI accessing your real sessions

Use agent-browser when:

You want speed — native Rust, daemon architecture
You need rich CLI commands — 108+ built-in operations
You want flexibility — standalone OR connected to your browser
You're working in cloud sandboxes with parallel agent sessions
You want the easiest path — just type commands and things happen

Use raw CDP when:

You need authenticated sessions — access your real logins
You want zero dependencies — just HTTP and WebSocket
You're building custom integrations specific to your workflow
You need to intercept network requests or capture auth tokens

Or combine them:

The tools aren't mutually exclusive. You can:

Use raw CDP or agent-browser --cdp for authenticated workflows (read your email, manage dashboards)
Use agent-browser standalone for fast scripted automation (fill forms, scrape data)
Use Playwright MCP for isolated testing (run tests in clean contexts)

All talking to the same Chrome, via the same protocol.

Get Started Now

Step 1: Start Chrome with debugging enabled:

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222

Step 2: Verify it works:

curl http://localhost:9222/json/version

Step 3: Pick your tool and try it:

# Option A: Playwright MCP (isolated, structured)
npx @playwright/mcp@latest --cdp-endpoint http://localhost:9222

# Option B: agent-browser (fast, flexible)
npm i -g agent-browser
agent-browser open example.com && agent-browser snapshot -i

# Option C: agent-browser connected to YOUR browser (authenticated)
agent-browser --cdp 9222 snapshot -i

# Option D: Raw CDP (minimal, no dependencies)
curl http://localhost:9222/json  # List your real tabs

You now have AI-ready browser control. Your logged-in sessions, your tabs, your data — all accessible to AI agents through the protocol that was hiding in plain sight.

The Bottom Line

Every Chrome browser ships with a remote control. Three major tools let AI agents use it. They all speak the same protocol, but they make fundamentally different choices about session isolation — and that one choice determines whether your AI agent sees a login page or your actual dashboard.

Now you know the difference. Build accordingly.

Give Your AI Agent the Instructions

Want your AI agent to already know how to use these tools? Each project publishes skills — ready-made instruction files you can pass to Claude Code, Cursor, OpenCode, or any AI coding agent.

agent-browser skills

npx skills add github:vercel-labs/agent-browser

5 skills available: general browser automation, QA/dogfood testing, Electron app control (VS Code, Slack, Discord, Figma), Slack workspace automation, and Vercel Sandbox cloud sessions.

The Electron skill is especially powerful — it teaches your AI agent how to launch and control desktop apps like VS Code, Slack, or Figma through CDP. Your AI agent can read Slack messages, navigate VS Code, or interact with any Chromium-based desktop app.

chrome-devtools-mcp skills

npx skills add github:anthropics/anthropic-cookbook chrome-devtools-mcp

5 skills available: core Chrome DevTools automation, CLI scripting, accessibility debugging, LCP performance optimization, and connection troubleshooting.

Raw CDP skill

npx skills add github:anthropics/anthropic-cookbook chrome-cdp

Skill available: Lightweight CDP CLI for live Chrome session control — connects to your real tabs with all cookies preserved. 13 commands, per-tab daemon architecture.

What are skills? Skills are instruction files that teach AI agents how to use specific tools. Instead of explaining everything yourself, you install a skill and your agent instantly knows the commands, best practices, and common patterns. Think of them like a manual the AI reads before it starts working.

Find me at timtech4u.dev or @timtech4u.

DEV Community

Your Browser Has a Remote Control — And Nobody Told You

Your Browser Has a Remote Control — And Nobody Told You

The Three Contenders

Setting It Up (60 Seconds)

Seeing It in Action: agent-browser

Install it:

Fill a form in 4 commands:

Before and after:

The Discovery: Session Sharing

What's happening?

The session sharing table:

How CDP Actually Works (The 2-Minute Version)

Head-to-Head: Features Compared

Playwright MCP (Microsoft)

agent-browser (Vercel Labs)

Raw CDP (DIY)

Using These Tools with AI Coding Agents

With Claude Code (MCP)

With agent-browser (CLI)

With agent-browser --cdp (Authenticated)

Connecting to Electron Apps

The Landscape Is Moving Fast

When to Use What

Use Playwright MCP when:

Use agent-browser when:

Use raw CDP when:

Or combine them:

Get Started Now

The Bottom Line

Give Your AI Agent the Instructions

agent-browser skills

chrome-devtools-mcp skills

Raw CDP skill

Top comments (0)