DEV Community

Double CHEN
Double CHEN

Posted on

We just shipped browser-act CLI — browser automation without writing code

We built BrowserAct because we kept running into the same wall: every time we needed to automate something in a browser, we had to start a whole project.

npm init -y
npm install playwright
npx playwright install chromium
# ... now write 25 lines of async/await just to load a page
Enter fullscreen mode Exit fullscreen mode

That's fine when you're building a test suite. Most of the time, you just want to grab a page's content, click something, or take a screenshot — from the terminal, in 30 seconds.

So we built browser-act CLI. Browser automation as terminal commands. No code, no project setup, no framework.

What it looks like

This is a real run. Three commands against Hacker News:

browser-act --session s1 navigate "https://news.ycombinator.com"
browser-act --session s1 wait stable
browser-act --session s1 get markdown
Enter fullscreen mode Exit fullscreen mode

Full page extracted as clean structured markdown — 3 commands, no code written.

Output: 15,547 characters of clean markdown from 78,320 chars of raw HTML. browser-act automatically strips ads, nav bars, and irrelevant noise.

Why not just use Playwright?

Playwright's getting started page: npm init, install, then download ~400MB of browser binaries — before a single line of automation.

Playwright / Puppeteer browser-act CLI
First-time setup npm init + install + ~400MB browser download npx skills add browser-act/skills --skill browser-act — once, global
Navigate + extract content ~25 lines of async/await boilerplate 3 commands
Session state Manual context management in every script --session persists automatically between commands
Shell integration Requires Node.js or Python runtime Pipe output directly to grep / jq / anything

Playwright is still the right choice for full E2E test suites with parallel workers, trace viewers, and CI pipelines. browser-act CLI is for everything else.

Get started

Install once:

npx skills add browser-act/skills --skill browser-act
Enter fullscreen mode Exit fullscreen mode

Core commands:

# Open a page and extract its content
browser-act --session s1 navigate "https://example.com"
browser-act --session s1 wait stable
browser-act --session s1 get markdown      # clean text output
browser-act --session s1 get html          # raw HTML

# Interact
browser-act --session s1 click 3           # click element by index
browser-act --session s1 input 2 "query"   # fill a field
browser-act --session s1 keys "Enter"

# Capture
browser-act --session s1 screenshot ./out.png

# Stealth mode (bypasses bot detection)
browser-act --session s1 browser list      # pick a stealth profile
Enter fullscreen mode Exit fullscreen mode

Sessions persist between commands — build multi-step automations in shell scripts without managing state yourself.

What people are using it for

  • Web scraping — no boilerplate, just commands and output
  • Shell pipelinesget markdown | grep | jq — works with every Unix tool you already use
  • AI agents — give an LLM direct browser access via CLI commands
  • Deployment verificationnavigateget markdown → assert expected content
  • n8n / Make / Zapier integrations — use as a step in no-code workflows

browser-act CLI is live today

browseract.com · Free to use · No credit card required

GitHub: github.com/browser-act/skills · AWS Marketplace available

Questions? Drop them in the comments — we read everything.

Top comments (0)