Your Browser Has a Remote Control — And Nobody Told You
3 tools that let AI agents drive Chrome. I tested all three. Here's what actually happens.
Every Chrome browser ships with a hidden feature that almost nobody talks about: a remote control API called the Chrome DevTools Protocol (CDP). It's the same protocol that powers Chrome DevTools — the thing you open when you press F12. But here's the part that changes everything: any program can use it. Including AI agents.
I've been using CDP daily for over a year to let AI agents browse the web using my real browser — with all my logged-in sessions intact. No passwords shared. No API keys. No OAuth flows. The AI just uses my browser like I would.
Three major tools have emerged to give AI agents this superpower. I tested all three on the same task, with the same browser, and discovered something that most tutorials and docs don't tell you.
The Three Contenders
| Playwright MCP | Chrome DevTools Protocol | agent-browser | |
|---|---|---|---|
| Made by | Microsoft | Google (built into Chrome) | Vercel Labs |
| GitHub Stars | 29,000+ | Built-in (no repo needed) | 23,500+ |
| Language | TypeScript | Any (HTTP + WebSocket) | Rust |
| Latest Version | v0.0.68 | Ships with Chrome | v0.21.2 |
| Install | npx @playwright/mcp |
Already in your browser | npm i -g agent-browser |
They all use CDP under the hood. But they use it very differently — and that difference matters more than you'd think.
Setting It Up (60 Seconds)
Before we compare the tools, let's enable Chrome's remote control. It takes one command.
Start Chrome with remote debugging:
# macOS
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
# Linux
google-chrome --remote-debugging-port=9222
# Windows
chrome.exe --remote-debugging-port=9222
Verify it's working:
curl http://localhost:9222/json/version
If you see a JSON response with your Chrome version, you're ready. That's the entire setup.
What just happened? You told Chrome to listen on port 9222 for remote control commands. Any program on your machine can now send instructions to your browser — open tabs, read pages, click buttons, fill forms, take screenshots.
Seeing It in Action: agent-browser
Let's start with the most beginner-friendly tool. agent-browser by Vercel Labs gives you 108+ simple commands to control Chrome from your terminal.
Install it:
npm install -g agent-browser
agent-browser install # Downloads a browser (first time only)
Fill a form in 4 commands:
Here's what happens at each step:
Step 1 — Open a page. Just like clicking a link, but from the command line.
Step 2 — Snapshot. This is the magic for AI agents. Instead of raw HTML, you get a clean list of interactive elements with reference IDs like [ref=e2]. An AI agent reads this and knows exactly what's on the page.
Step 3 — Interact. Use those ref IDs to fill fields, click buttons, check boxes. agent-browser fill @e2 "John Doe" fills the customer name field. Simple.
Step 4 — Screenshot. Take a picture of the result. With --annotate, every interactive element gets a numbered label — perfect for AI vision models.
Before and after:
The numbered red labels are --annotate mode. Each number maps to an element the AI can interact with. This is how vision-based AI agents understand web pages.
The Discovery: Session Sharing
Here's the thing nobody tells you. I tested all three tools against the same page — an authenticated dashboard behind a login. Same browser, same URL, same Chrome instance.
Two tools saw the full dashboard. One saw a login page.
What's happening?
When you connect via raw CDP or agent-browser --cdp 9222, you're using Chrome's default browser context. This means the AI agent sees exactly what you'd see — all your cookies, all your logged-in sessions, everything.
When you use Playwright MCP or agent-browser in standalone mode, they create an isolated browser context. Think of it like an incognito window. No cookies, no sessions, no logins. A clean slate.
The session sharing table:
| Tool | Mode | Sees Your Logins? | Why |
|---|---|---|---|
| Raw CDP | Default context | Yes | Uses Chrome's real cookie jar |
| agent-browser | --cdp 9222 |
Yes | Connects to Chrome's default context |
| agent-browser | Standalone | No | Launches its own browser |
| Playwright MCP | Default | No | Creates an isolated browserContext |
If you want your AI agent to use your existing logins — to read your email, check your dashboards, manage your accounts — you need raw CDP or agent-browser connected via --cdp.
If you want isolation — for testing, scraping, or running untrusted automations — Playwright MCP or agent-browser standalone gives you that by default.
Neither is "better." They're for different jobs. But most people don't know the difference exists.
How CDP Actually Works (The 2-Minute Version)
Chrome's remote control has three layers. That's it.
1. List open tabs (HTTP GET)
curl http://localhost:9222/json
Returns a JSON list of every tab with its title, URL, and WebSocket address.
2. Open a new tab (HTTP PUT — changed in Chrome 145+)
curl -X PUT "http://localhost:9222/json/new?https://example.com"
3. Send commands (WebSocket)
// Connect to a tab's WebSocket URL, then send:
{ "method": "Page.navigate", "params": { "url": "https://example.com" } }
{ "method": "Runtime.evaluate", "params": { "expression": "document.title" } }
{ "method": "Page.captureScreenshot" }
That's the entire protocol. HTTP for tab management, WebSocket for commands. You can read, click, type, screenshot, and intercept network requests on any page.
Head-to-Head: Features Compared
Playwright MCP (Microsoft)
Best for: AI agents that need structured browser automation with safety guarantees.
# Add to Claude Code, Cursor, or any MCP-compatible AI tool
npx @playwright/mcp@latest
What it gives you:
-
Snapshot mode — Returns an accessibility tree. AI agents reference elements by
refIDs instead of fragile CSS selectors - Console + network — Capture console logs and network requests
- Form filling — Dedicated tools for clicks, fills, selects
- Screenshots — PNG/JPEG with element-level targeting
- Session isolation — Each connection gets its own clean context
-
Extension mode —
--extensionflag creates a bridge that CAN share sessions (shipped recently)
29,000+ stars. Isolated by default (intentional). Very active development.
agent-browser (Vercel Labs)
Best for: Fast, native CLI automation. AI agents that need speed and flexibility.
npm install -g agent-browser
agent-browser install # Downloads Chrome for Testing (first time)
What it gives you:
-
108+ commands —
open,click,fill,snapshot,screenshot,eval,get text,find role,mouse,network,har, and more -
Annotated screenshots —
--annotatelabels interactive elements with numbered boxes for vision models -
Daemon architecture — Browser persists between commands, chain with
&& -
CDP connection —
--cdp 9222connects to your real browser with all sessions -
Session persistence —
--session-name myappauto-saves and restores browser state -
Auto-connect —
--auto-connectfinds your running Chrome automatically - iOS Simulator support — Test on iPhone simulators via Appium
- HAR recording — Capture full HTTP archive of all requests
# Chain commands — browser stays alive between them
agent-browser open example.com && \
agent-browser wait --load networkidle && \
agent-browser snapshot -i
Built in Rust. 78 releases in ~3 months. Created by Malte Ubl (Vercel CTO) and team. 23,500+ stars.
Raw CDP (DIY)
Best for: Maximum control, authenticated workflows, custom integrations.
No install needed — just talk to Chrome's HTTP/WebSocket API directly:
# Check what's running
curl http://localhost:9222/json/version
# List your real open tabs
curl http://localhost:9222/json
# Open a new tab (preserves all cookies and sessions)
curl -X PUT "http://localhost:9222/json/new?https://example.com"
No framework. No dependencies. Just HTTP requests and WebSocket messages. You can build a full browser automation tool in a few hundred lines of code. This is the lowest-level option — maximum power, maximum flexibility.
Using These Tools with AI Coding Agents
These tools really shine when connected to AI coding agents like Claude Code, Cursor, OpenCode, or Windsurf. Here's how:
With Claude Code (MCP)
// Add to .claude/mcp.json
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
Now Claude Code can browse the web, fill forms, take screenshots, and read pages directly.
With agent-browser (CLI)
Any AI agent that can run shell commands can use agent-browser:
# AI agent runs these commands to research a topic
agent-browser open "en.wikipedia.org/wiki/Chrome_DevTools"
agent-browser snapshot -i # Read the page content
agent-browser screenshot # See what it looks like
With agent-browser --cdp (Authenticated)
# Connect to YOUR Chrome — AI reads pages as you
agent-browser --cdp 9222 snapshot -i
# Now the AI can see your dashboards, email, authenticated content
Connecting to Electron Apps
Since Electron apps (VS Code, Slack, Discord) are built on Chromium, you can control them too:
# Launch any Electron app with CDP debugging
"/Applications/Visual Studio Code.app/Contents/MacOS/Electron" --remote-debugging-port=9333
# Connect agent-browser to VS Code
agent-browser --cdp 9333 snapshot
The Landscape Is Moving Fast
This isn't a niche topic anymore. The browser automation for AI agents space is exploding:
- browser-use (78,000 stars) — Originally built on Playwright, switched to raw CDP in 2025 for speed
- chrome-devtools-mcp (30,000+ stars) — Google's Chrome team released an official MCP server for CDP
- Chrome 146 — Google is adding a native settings toggle for AI agent access via MCP, built right into Chrome
The trend is clear: CDP is becoming the standard interface between AI agents and web browsers. Google endorses it. Microsoft builds on it. Vercel bets on it.
When to Use What
Use Playwright MCP when:
- You need test isolation — each run starts clean
- You're building automated testing pipelines
- You want structured accessibility snapshots for AI
- Security matters — you don't want the AI accessing your real sessions
Use agent-browser when:
- You want speed — native Rust, daemon architecture
- You need rich CLI commands — 108+ built-in operations
- You want flexibility — standalone OR connected to your browser
- You're working in cloud sandboxes with parallel agent sessions
- You want the easiest path — just type commands and things happen
Use raw CDP when:
- You need authenticated sessions — access your real logins
- You want zero dependencies — just HTTP and WebSocket
- You're building custom integrations specific to your workflow
- You need to intercept network requests or capture auth tokens
Or combine them:
The tools aren't mutually exclusive. You can:
- Use raw CDP or agent-browser --cdp for authenticated workflows (read your email, manage dashboards)
- Use agent-browser standalone for fast scripted automation (fill forms, scrape data)
- Use Playwright MCP for isolated testing (run tests in clean contexts)
All talking to the same Chrome, via the same protocol.
Get Started Now
Step 1: Start Chrome with debugging enabled:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
Step 2: Verify it works:
curl http://localhost:9222/json/version
Step 3: Pick your tool and try it:
# Option A: Playwright MCP (isolated, structured)
npx @playwright/mcp@latest --cdp-endpoint http://localhost:9222
# Option B: agent-browser (fast, flexible)
npm i -g agent-browser
agent-browser open example.com && agent-browser snapshot -i
# Option C: agent-browser connected to YOUR browser (authenticated)
agent-browser --cdp 9222 snapshot -i
# Option D: Raw CDP (minimal, no dependencies)
curl http://localhost:9222/json # List your real tabs
You now have AI-ready browser control. Your logged-in sessions, your tabs, your data — all accessible to AI agents through the protocol that was hiding in plain sight.
The Bottom Line
Every Chrome browser ships with a remote control. Three major tools let AI agents use it. They all speak the same protocol, but they make fundamentally different choices about session isolation — and that one choice determines whether your AI agent sees a login page or your actual dashboard.
Now you know the difference. Build accordingly.
Give Your AI Agent the Instructions
Want your AI agent to already know how to use these tools? Each project publishes skills — ready-made instruction files you can pass to Claude Code, Cursor, OpenCode, or any AI coding agent.
agent-browser skills
npx skills add github:vercel-labs/agent-browser
5 skills available: general browser automation, QA/dogfood testing, Electron app control (VS Code, Slack, Discord, Figma), Slack workspace automation, and Vercel Sandbox cloud sessions.
The Electron skill is especially powerful — it teaches your AI agent how to launch and control desktop apps like VS Code, Slack, or Figma through CDP. Your AI agent can read Slack messages, navigate VS Code, or interact with any Chromium-based desktop app.
chrome-devtools-mcp skills
npx skills add github:anthropics/anthropic-cookbook chrome-devtools-mcp
5 skills available: core Chrome DevTools automation, CLI scripting, accessibility debugging, LCP performance optimization, and connection troubleshooting.
Raw CDP skill
npx skills add github:anthropics/anthropic-cookbook chrome-cdp
Skill available: Lightweight CDP CLI for live Chrome session control — connects to your real tabs with all cookies preserved. 13 commands, per-tab daemon architecture.
What are skills? Skills are instruction files that teach AI agents how to use specific tools. Instead of explaining everything yourself, you install a skill and your agent instantly knows the commands, best practices, and common patterns. Think of them like a manual the AI reads before it starts working.
Find me at timtech4u.dev or @timtech4u.





Top comments (0)