zmy

Posted on Jun 9

How to Give Your AI Agent Browser Superpowers with Browser-CLI

#ai #webdev #automation #cli

How to Give Your AI Agent Browser Superpowers with Browser-CLI

Your AI coding assistant just wrote perfect code. Now you need it to test the login flow on your staging site.

You ask Claude Code: "Go to staging.example.com, log in with test@example.com, and verify the dashboard loads."

Claude's response: "I can't interact with web browsers. I can write Playwright scripts for you, but I can't run them."

This is the missing piece. Your AI agent has all the intelligence to navigate websites, fill forms, and extract data — but no hands to actually do it.

Browser-CLI fixes this. It's a command-line tool that gives AI agents direct browser control through simple shell commands. No scripts. No complex setup. Just:

browser-cli navigate https://staging.example.com
browser-cli fill "#email" "test@example.com"
browser-cli fill "#password" "secret"
browser-cli click "button[type=submit]"
browser-cli wait ".dashboard"
browser-cli screenshot result.png

Every command returns structured JSON that AI agents can parse and act on. The result? Your AI agent can now browse the web, test your apps, and automate repetitive tasks — all by itself.

Why You Need This

If you use any AI coding assistant (Claude Code, Cursor, Codex, Windsurf, etc.), you've probably hit these walls:

Task	AI Agent Can Do It?	Why Not?
Log into a website and scrape data	❌	No browser access
Fill out and submit forms	❌	No browser access
Take screenshots of pages	❌	No browser access
Test web apps end-to-end	❌	No browser access
Automate repetitive web tasks	❌	No browser access

Existing solutions don't fit AI agents:

Puppeteer/Playwright: Require writing JavaScript/Python scripts. AI agents can generate these scripts, but running them requires a Node.js/Python environment, proper setup, and error handling. It's fragile and slow.
browser-use: A Python library that uses LLMs to control browsers. But it's designed for Python developers, not as a tool for AI coding assistants to call directly.
Actionbook: Similar concept, but requires specific integrations and doesn't have the AI-first design.

Browser-CLI is different. It's designed from the ground up for AI agents to call as a shell command:

✅ Zero-code: Just CLI commands, no scripts to write or maintain
✅ JSON output: Structured responses that AI agents can parse reliably
✅ Auto-managed server: No manual server start/stop, it just works
✅ Session isolation: Multiple agents can run in parallel without conflicts
✅ Login persistence: Log in once manually, reuse forever in automation
✅ Stealth mode: Bypasses bot detection on sites like Google, Cloudflare

Quick Start: 5 Minutes to Browser Superpowers

Step 1: Install

git clone https://github.com/zmysysz/browser-cli
cd browser-cli
make build
make setup-browsers  # Downloads Playwright browsers (first time only)
make install         # Adds to PATH

Requirements: Go 1.21+ (for building). The binary is fully static — no CGO, no runtime dependencies.

Step 2: Try It

# Navigate to a page (server auto-starts)
browser-cli navigate https://example.com

# Extract page content
browser-cli text

Output:

{
  "command": "text",
  "status": "success",
  "data": {
    "content": "Example Domain\nThis domain is for use in illustrative examples...",
    "url": "https://example.com/",
    "title": "Example Domain"
  }
}

Step 3: Integrate with Your AI Tool

Claude Code:

mkdir -p .claude/commands/
cp integrations/claude/browser.md .claude/commands/

OpenAI Codex:

mkdir -p ~/.codex/skills/
cp integrations/codex/browser-cli.md ~/.codex/skills/

Cursor / Windsurf / Others: The AGENTS.md file at the project root is automatically read by most AI coding tools.

Now just ask your AI: "Use browser-cli to check the top 5 posts on Hacker News"

Real-World Examples

Example 1: Automated Web Scraping

# Navigate to Hacker News
browser-cli navigate https://news.ycombinator.com

# Extract the top 5 post titles using JavaScript
browser-cli eval "JSON.stringify(
  Array.from(document.querySelectorAll('.titleline > a'))
    .slice(0, 5)
    .map(a => ({title: a.textContent, link: a.href}))
)"

Output:

{
  "command": "eval",
  "status": "success",
  "data": {
    "result": "[{\"title\":\"Show HN: I built a CLI tool for AI agents\",\"link\":\"https://github.com/...\"}, ...]"
  }
}

Example 2: Login and Scrape Protected Content

# First time: log in manually (opens a visible browser)
browser-cli login https://github.com
# ... complete login in the browser window ...
# Press Ctrl+C when done — state is saved to ./github-state.json

# Now automate using the saved login state
browser-cli --state ./github-state.json navigate https://github.com/settings
browser-cli text

The login state persists forever. You can reuse it across sessions, projects, and even different AI agents.

Example 3: Form Submission

browser-cli navigate https://httpbin.org/forms/post

# Fill multiple fields
browser-cli fill "input[name='custname']" "John Doe"
browser-cli fill "input[name='custtel']" "555-1234"
browser-cli fill "input[name='custemail']" "john@example.com"
browser-cli fill "textarea[name='comments']" "This is a test submission"

# Select radio button
browser-cli click "input[value='medium']"

# Submit
browser-cli click "button[type=submit]"

# Get the result
browser-cli text

Example 4: Multi-Step Workflow in One Command

browser-cli run "navigate https://example.com; click a; text; screenshot result.png"

The run command executes multiple actions in a single round-trip, reducing latency for complex workflows.

Example 5: Handle JavaScript Dialogs

browser-cli navigate https://example.com/delete
browser-cli click ".delete-button"

# Check if a dialog appeared
browser-cli dialog-status
# → {"dialog": {"type": "confirm", "message": "Are you sure?"}}

# Accept or dismiss
browser-cli dialog-accept
# or: browser-cli dialog-dismiss

Feature Deep Dive

🥷 Stealth Mode: Bypass Bot Detection

Browser-CLI includes built-in stealth mode that overrides automation fingerprints:

navigator.webdriver — Set to undefined instead of true
navigator.plugins — Faked with realistic entries
navigator.mimeTypes — Includes PDF viewer
window.chrome — Added to match real Chrome
Permissions API — Auto-grants notifications
Chromium flags — Disables AutomationControlled blink feature

This lets you automate sites that normally block bots, like Google sign-in, Cloudflare-protected pages, and banking sites.

# Use system Chrome for even better stealth
browser-cli --chrome navigate https://accounts.google.com

🍪 Login Persistence: Set and Forget

The login command opens a visible browser for you to log in manually. Once done, the entire browser state (cookies + localStorage) is saved to a JSON file:

# Log in once
browser-cli login https://twitter.com
# → saves to ./twitter-login.json

# Reuse forever
browser-cli --state ./twitter-login.json navigate https://twitter.com/home
browser-cli text  # You're already logged in!

🔗 Session Isolation: Parallel Agents

Multiple AI agents can run simultaneously without interfering with each other:

# Agent 1 works on site A
browser-cli --session agent-1 navigate https://site-a.com

# Agent 2 works on site B (completely isolated)
browser-cli --session agent-2 navigate https://site-b.com

# List all active sessions
browser-cli session-list

Each session gets its own browser instance, cookies, and storage.

🎯 Web Components Support

Modern web apps use custom elements that don't respond to standard clicks. Browser-CLI's smart-click detects and triggers internal handlers:

# Normal click might not work on custom elements
browser-cli click "my-custom-button"  # ❌ May fail

# Smart-click finds and calls internal methods
browser-cli smart-click "my-custom-button"  # ✅ Works!

Detection patterns: _on*, _handle*, handle*, _click, _submit, _action

📸 Screenshots, PDFs, and More

# Screenshot
browser-cli screenshot page.png

# Full page PDF (Chromium only)
browser-cli pdf report.pdf

# Screenshot specific element
browser-cli elements ".chart"
browser-cli eval "document.querySelector('.chart').scrollIntoView()"
browser-cli screenshot chart.png

Comparison with Alternatives

Feature	Browser-CLI	Puppeteer	Playwright	browser-use	Actionbook
AI-friendly output	JSON by default	JS API	JS/Python API	Natural language	JSON
Zero-code usage	✅ CLI commands	❌ Scripts	❌ Scripts	✅ LLM-driven	✅ Actions
Session isolation	✅ Built-in	❌ Manual	❌ Manual	❌	✅
Login persistence	✅ Built-in	❌ Manual	❌ Manual	❌	✅
Stealth mode	✅ Built-in	❌ Plugin	❌ Plugin	✅	❌
Web Components	✅ smart-click	❌	❌	✅	❌
Language	Go (single binary)	Node.js	Node/Python	Python	Go
Runtime deps	None	Node.js	Node/Python	Python + deps	None
Setup complexity	Low	Medium	Medium	High	Low

When to use Browser-CLI:

You want AI agents to control browsers directly
You need a simple, script-free automation solution
You want login state persistence
You need to bypass bot detection
You want parallel agent execution

When to use Puppeteer/Playwright:

You're writing traditional test scripts
You need fine-grained control over browser behavior
You're building a testing framework

When to use browser-use:

You want natural language browser control
You're building an LLM-powered application

All Commands Reference

Navigation

Command	Description
`navigate <url>`	Navigate to URL
`back`	Go back in history
`forward`	Go forward in history
`reload`	Reload current page

Interaction

Command	Description
`click <selector>`	Click element
`smart-click <selector>`	Click Web Components
`fill <selector> <value>`	Fill input field
`type <selector> <text>`	Type character by character
`select <selector> <value>`	Select dropdown option
`hover <selector>`	Hover over element
`keyboard <key>`	Press key (e.g., `Ctrl+A`, `Enter`)
`upload <selector> <file>`	Upload file
`eval <script>`	Execute JavaScript

Extraction

Command	Description
`text`	Extract page text
`screenshot [path]`	Take screenshot
`elements <selector>`	Find elements
`pdf [file]`	Save as PDF

Utility

Command	Description
`wait <selector>`	Wait for element
`scroll <up\	down>`
`pick <x> <y>`	Inspect element at coordinates

Tabs & Sessions

Command	Description
`tab-new`	Create new tab
`tab-list`	List all tabs
`tab-switch <id>`	Switch to tab
`session-list`	List active sessions
`stop`	Stop server

Call to Action

Star the repo: https://github.com/zmysysz/browser-cli ⭐

Try it now:

git clone https://github.com/zmysysz/browser-cli
cd browser-cli && make build && make setup-browsers
browser-cli navigate https://example.com
browser-cli text

Integrate with your AI tool:

Claude Code: Copy integrations/claude/browser.md to .claude/commands/
Codex: Copy integrations/codex/browser-cli.md to ~/.codex/skills/
Others: Check the integrations/ folder

Have questions or feature requests? Open an issue on GitHub or drop a comment below!

Browser-CLI is open source under Apache 2.0. Built with Go + Playwright. No CGO required — works on Linux, macOS, and Windows.

DEV Community

How to Give Your AI Agent Browser Superpowers with Browser-CLI

How to Give Your AI Agent Browser Superpowers with Browser-CLI

Why You Need This

Quick Start: 5 Minutes to Browser Superpowers

Step 1: Install

Step 2: Try It

Step 3: Integrate with Your AI Tool

Real-World Examples

Example 1: Automated Web Scraping

Example 2: Login and Scrape Protected Content

Example 3: Form Submission

Example 4: Multi-Step Workflow in One Command

Example 5: Handle JavaScript Dialogs

Feature Deep Dive

🥷 Stealth Mode: Bypass Bot Detection

🍪 Login Persistence: Set and Forget

🔗 Session Isolation: Parallel Agents

🎯 Web Components Support

📸 Screenshots, PDFs, and More

Comparison with Alternatives

All Commands Reference

Navigation

Interaction

Extraction

Utility

Tabs & Sessions

Call to Action

Top comments (0)