DEV Community

Cover image for I Gave My AI Agent a Real Browser - Here's What Actually Happened

I Gave My AI Agent a Real Browser - Here's What Actually Happened

👋 Hey there, Tech Enthusiasts!

I'm Sarvar, a Cloud Architect who loves turning complex tech problems into simple solutions. I've worked with AWS, Azure, DevOps, Data, Analytics, Generative-AI and Agentic-AI building real systems for real companies. In this article series, I'll share what I've learned in a way that's easy to follow, whether you're experienced or just getting started.

Let's get into it! 🚀


"Your agent can write Terraform, deploy infrastructure, and debug pipelines. But ask it to check a dashboard behind Cloudflare? It's useless."


If you read my previous article I Let an AI Agent Become My DevOps Engineer you know I've been pushing AI agents into real operational workflows. Code generation, CI/CD pipelines, infrastructure provisioning agents handle all of that now.

But there's one area where every agent I've used just... stops working. The browser.

Here's the reality: Most of the tools I monitor daily Grafana dashboards, AWS Console, CI pipelines, internal wikis live behind login walls, CAPTCHAs, and anti-bot protection. My agents can't touch them.

That's where BrowserAct comes in.

I found this tool that claims to give AI agents real browser control not headless puppeteer scripts that get blocked instantly, but actual anti-detection browsing with CAPTCHA handling and human handoff built in.

I spent a week testing it. Here's what I found.

By the end of this article, you'll:

  • Understand why AI agents fail at real browser tasks
  • Install BrowserAct and run your first extraction
  • See how anti-detection browsing actually works
  • Test human handoff for 2FA/login scenarios
  • Run parallel browser sessions without cross-contamination
  • Turn a repeated workflow into a reusable Skill

Time Required: 30 minutes (15 min read + 15 min hands-on)
Difficulty: Intermediate
Prerequisites: Python 3.12+, Node.js 18+, Google Chrome, terminal access


The Problem: Agents Can't Browse the Real Web

The Reality of AI Agent Automation in 2026

I manage infrastructure across multiple AWS accounts. Every morning I check:

  • CloudWatch dashboards
  • GitHub notifications and PR reviews
  • CI/CD pipeline status
  • Internal monitoring tools
  • Competitor product pages (for research)

That's easily 30-40 minutes of tab-switching, scrolling, and context-gathering before I even start real work.

I thought my agent can write Terraform and deploy entire VPCs. Surely it can open a webpage and read some data?

Nope.

Attempt 1: Basic web fetch

curl https://monitoring-dashboard.internal.com
Enter fullscreen mode Exit fullscreen mode
403 Forbidden - Access Denied
Enter fullscreen mode Exit fullscreen mode

Attempt 2: Headless Puppeteer

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://dashboard.example.com');
// Blocked by Cloudflare challenge
Enter fullscreen mode Exit fullscreen mode
Error: Page stuck on verification challenge
Enter fullscreen mode Exit fullscreen mode

Attempt 3: Selenium
Same story. Detected as bot within 2 seconds.

Problems:

  • Anti-bot systems detect headless browsers instantly
  • CAPTCHAs stop automation cold
  • 2FA/login flows need a human but there's no way to hand off
  • Running multiple accounts in one browser gets all of them flagged
  • Every script breaks when the site changes layout

My daily routine before BrowserAct:

  1. Open 6 tabs manually
  2. Log into GitHub, AWS Console, Grafana MFA for each
  3. Scroll through notifications, check pipeline status
  4. Copy-paste data into Slack for the team
  5. Repeat next morning

40 minutes. Every. Day.

Sound familiar? I needed something built specifically for this a browser that agents can actually use on the real web.


What is BrowserAct?

Simple Version

BrowserAct is a CLI that gives your AI agent a real Chrome browser with anti-detection, CAPTCHA solving, and human handoff built in.

Think of it like this:

  • Puppeteer/Playwright = giving your agent a browser that screams "I'M A BOT"
  • BrowserAct = giving your agent a browser that looks and behaves like a real person

The Five Things It Actually Does

  1. Gets past anti-bot walls: Real fingerprints, proxy rotation, stealth browsing. Sites don't know it's automated.

  2. Handles CAPTCHAs automatically: Cloudflare, DataDome, reCAPTCHA. Solves what it can, escalates what it can't.

  3. Hands off to humans when stuck: Hit a QR code login? SMS verification? It generates a URL. You (or a teammate) open it on your phone, do the human step, and the agent continues from where it stopped.

  4. Runs parallel tasks without interference: Three sessions checking three different things under the same account. They don't step on each other.

  5. Turns workflows into reusable Skills: Did something once? Package it. Run it again without the agent having to figure it out from scratch.


Prerequisites

Before starting, make sure you have:

  • Python 3.12+ (BrowserAct CLI is Python-based)
  • Node.js 18+ (for npx skills Skill installation)
  • uv package manager (or pip)
  • Google Chrome installed
  • Terminal/CLI access
  • An AI agent that can run shell commands (Claude Code, Cursor, Kiro, Codex or just run commands yourself)

Quick Check

python3 --version
# Python 3.12+

node --version
# v18+

google-chrome --version
# Google Chrome 149.x.x.x
Enter fullscreen mode Exit fullscreen mode

If you don't have uv (fast Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

If you don't have Chrome:

# Ubuntu/Debian
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

# Amazon Linux
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm
Enter fullscreen mode Exit fullscreen mode


What We're Building

Here's what we're going to test today:

┌─────────────────────────────────────────────────┐
│                 AI Agent (You/Claude/Cursor)    │
└─────────────────────┬───────────────────────────┘
                      │ CLI commands
                      ▼
┌─────────────────────────────────────────────────┐
│              BrowserAct CLI                     │
│  ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│  │ Stealth  │ │ Sessions │ │  Human Handoff   │ │
│  │ Browser  │ │ Manager  │ │  (remote-assist) │ │
│  └──────────┘ └──────────┘ └──────────────────┘ │
└─────────────────────┬───────────────────────────┘
                      │ Real Chrome
                      ▼
┌─────────────────────────────────────────────────┐
│  Protected Websites (Cloudflare, Login Walls)   │
└─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Step 1: Install BrowserAct

Two parts here the Skill definition (for AI agents) and the actual CLI.

Install the Skill (for agent integration):

npx skills add browser-act/skills --skill browser-act --yes
Enter fullscreen mode Exit fullscreen mode

Output:

◇  Installation complete

│  ✓ ~/.agents/skills/browser-act
│    universal: Amp, Antigravity, Antigravity CLI, Cline, Codex +12 more
Enter fullscreen mode Exit fullscreen mode

Install the CLI itself:

uv tool install browser-act-cli --python 3.12
Enter fullscreen mode Exit fullscreen mode

Output:

Resolved 90 packages in 437ms
Installed 90 packages in 2.09s
Installed 1 executable: browser-act
Enter fullscreen mode Exit fullscreen mode

Verify it works (required on first install):

browser-act get-skills core --skill-version 2.0.2
Enter fullscreen mode Exit fullscreen mode

This returns the environment state, available browsers, and command reference. Run this before anything else it completes the version handshake.

How to Get Your BrowserAct API Key

  1. Go to BrowserAct and sign in to your account.

  2. Click on your profile email address in the top-right corner.

  3. From the dropdown menu, select API Keys.

  4. Click Manage Keys.

  5. Select Create Key.

  6. Enter a name for your API key (for example, Amazon-Q, MCP-Server, or Development).

  7. Click Create.

  8. Copy the generated API key and store it securely. You may not be able to view the full key again after leaving the page.

Note: Treat your API key like a password. Do not share it publicly or commit it to source code repositories.

Set your API key

browser-act auth set <your-api-key>
Enter fullscreen mode Exit fullscreen mode


API key saved.
Enter fullscreen mode Exit fullscreen mode

Step 2: Your First Extraction (Zero Config)

The simplest thing BrowserAct can do extract content from a page without any setup:

browser-act stealth-extract https://httpbin.org/ip
Enter fullscreen mode Exit fullscreen mode

Output from my test:

{
  "origin": "100.54.212.44"
}
Enter fullscreen mode Exit fullscreen mode

Clean, rendered content. No HTML tags, no noise. Just the data.

Let's try something with actual content:

browser-act stealth-extract https://example.com
Enter fullscreen mode Exit fullscreen mode

Output:

# Example Domain
This domain is for use in documentation examples without needing permission. Avoid use in operations.
[Learn more](https://iana.org/domains/example)
Enter fullscreen mode Exit fullscreen mode

Already in markdown format. My agent can read this directly without any parsing.

What stealth-extract does under the hood:

  • Spins up a lightweight stealth browser
  • Visits the URL with anti-detection fingerprint
  • Renders JavaScript (unlike curl)
  • Returns content in markdown
  • Tears down the browser

Important note from testing: stealth-extract works great for quick reads on most sites. For heavily protected sites (like nowsecure.nl with aggressive Cloudflare), you'll need a full browser session (Step 3) it has stronger anti-detection because it maintains a persistent fingerprint and proxy.


Step 3: Full Browser Control

Now let's do something more interesting interactive browser automation. First, you need a stealth browser:

browser-act browser create --name "test-stealth" --type stealth --desc "Testing for article"
Enter fullscreen mode Exit fullscreen mode

Output:

id=99703194156616493 name="test-stealth" type=stealth
  desc="Testing for article"
Enter fullscreen mode Exit fullscreen mode

Now open a session:

browser-act --session my-research browser open 99703194156616493 https://github.com/trending
Enter fullscreen mode Exit fullscreen mode

Output:

session_name=my-research
browser_type=stealth
url=https://github.com/trending
title=github.com/trending
Enter fullscreen mode Exit fullscreen mode

See what's on the page (indexed elements):

browser-act --session my-research state
Enter fullscreen mode Exit fullscreen mode

Real output from my test:

url=https://github.com/trending
title=Trending repositories on GitHub today · GitHub

|SCROLL|<html class=js-focus-visible /> (0.0 pages above, 1.2 pages below)

  [1]<a aria-label=Homepage />
  [3]<button type=button aria-expanded=false />
      Platform
  [4]<button type=button aria-expanded=false />
      Solutions
  [8]<a class=NavLink-module__link__EG3d4 />
      Pricing
  [9]<qbsearch-input class=search-input />
      [10]<div class=search-input-container search-with-dialog />
  [12]<a />
      Sign in
  [15]<a class=js-selected-navigation-item />
      Explore
  [17]<a class=js-selected-navigation-item selected />
      Trending
Enter fullscreen mode Exit fullscreen mode

See those numbers? That's how the agent interacts. No DOM parsing, no CSS selectors. Just:

# Click the search input (element 10)
browser-act --session my-research click 10
Enter fullscreen mode Exit fullscreen mode


clicked=10
Enter fullscreen mode Exit fullscreen mode

The page updates search box opens. Now type:

browser-act --session my-research input 10 "browser automation AI"
Enter fullscreen mode Exit fullscreen mode


input="browser automation AI" element=10
Enter fullscreen mode Exit fullscreen mode

This is what they mean by "designed for agent reasoning." The output is compact, indexed, and token-efficient. My agent doesn't waste tokens parsing HTML.

You can also grab the full page as markdown:

browser-act --session my-research get markdown
Enter fullscreen mode Exit fullscreen mode

Returns the entire page content in clean markdown format. Useful when you want to extract data rather than interact.

When you're done:

browser-act session close my-research
Enter fullscreen mode Exit fullscreen mode


session_name=my-research closed=true
Enter fullscreen mode Exit fullscreen mode

Step 4: Anti-Bot in Action

Here's where it gets real. I pointed BrowserAct at nowsecure.nl a site specifically designed to test anti-bot detection. It runs Cloudflare challenges.

# Full browser session on a Cloudflare-protected site
browser-act --session antibot browser open 99703194156616493 https://nowsecure.nl
Enter fullscreen mode Exit fullscreen mode

Output:

session_name=antibot
browser_type=stealth
url=https://nowsecure.nl/
title=nowsecure.nl
Enter fullscreen mode Exit fullscreen mode

It got through. Let me verify by pulling the content:

browser-act --session antibot get markdown
Enter fullscreen mode Exit fullscreen mode


nowsecure.nl
NOWSECURE
---------
### by nodriver
Enter fullscreen mode Exit fullscreen mode

And checking the network traffic shows exactly what happened Cloudflare's turnstile challenge was handled automatically:

browser-act --session antibot network requests
Enter fullscreen mode Exit fullscreen mode


# format: csv
...
GET,200,Script,application/javascript,...,https://challenges.cloudflare.com/turnstile/v0/g/8fc8ed1d8752/api.js
GET,200,Document,text/html,...,https://challenges.cloudflare.com/cdn-cgi/challenge-platform/h/g/turnstile/...
Enter fullscreen mode Exit fullscreen mode

The stealth browser handled the Cloudflare verification without any manual intervention. No CAPTCHA prompt, no block.

BrowserAct uses three layers to get through:

  1. Environment layer - Stealth fingerprint, TLS config, proxy switching. Most blocks never trigger.
  2. Execution layer - If a CAPTCHA appears, solve-captcha handles it automatically.
  3. Human layer - If auto-solve fails, it can hand off to you (more on this next).

I'm not going to claim it works on every site. It doesn't. No tool does. But for the monitoring dashboards and research pages I tested, it got through where Puppeteer and Selenium couldn't.


Step 5: Human Handoff (This Is the Killer Feature)

This is the part that sold me. Here's the scenario:

Your agent is automating a workflow. It hits a login page that requires SMS verification or QR code scan. In Puppeteer world, the automation just dies. Game over.

BrowserAct does something different:

browser-act --session login-task remote-assist --objective "complete 2FA verification"
Enter fullscreen mode Exit fullscreen mode

Real output from my test:

Remote assist session created.

Share this URL with the user:
  https://www.browseract.com/remote-cli/d83544c39e4a4e6ba1cc98f95050e615
expires in 1h 0m

Human assist is now active - the browser is under user control.
Do not send browser commands until the user finishes the assist session.
Enter fullscreen mode Exit fullscreen mode

You open that URL on your phone or another device. You see the browser the actual live browser state. You complete the SMS verification, scan the QR code, whatever. Then you close it.

The agent gets notified and continues from the exact same browser state.

Why this matters for DevOps:

  • Internal tools with SSO that requires periodic re-auth
  • AWS Console with MFA
  • Third-party dashboards with 2FA
  • Any workflow where "fully automated" isn't realistic

The agent doesn't crash. It doesn't restart. It waits, you help, it continues. That's practical automation.


Step 6: Parallel Sessions (Multi-Task Without Conflicts)

Here's a real use case from my work: I want an agent to check three things simultaneously under the same account.

# Session 1: Check GitHub trending
browser-act --session check-trending browser open 99703194156616493 https://github.com/trending
Enter fullscreen mode Exit fullscreen mode


session_name=check-trending
browser_type=stealth
url=https://github.com/trending
title=github.com/trending
Enter fullscreen mode Exit fullscreen mode
# Session 2: Check GitHub topics (same browser, parallel session)
browser-act --session check-topics browser open 99703194156616493 https://github.com/topics
Enter fullscreen mode Exit fullscreen mode


session_name=check-topics
browser_type=stealth
url=https://github.com/topics
title=github.com/topics
Enter fullscreen mode Exit fullscreen mode

Two sessions. One browser. They don't interfere with each other.

# See all active sessions
browser-act session list
Enter fullscreen mode Exit fullscreen mode


session_name: check-trending
browser_type: stealth
browser_id: 99703194156616493
title: Trending repositories on GitHub today · GitHub
url: https://github.com/trending

session_name: check-topics
browser_type: stealth
browser_id: 99703194156616493
title: Topics on GitHub · GitHub
url: https://github.com/topics
Enter fullscreen mode Exit fullscreen mode

Each session has its own navigation state but shares the login cookies. So you log in once, and all tasks can work in parallel.

The three concurrency models:

Model What's shared Use case
Cross-browser parallel Nothing (independent identity) Multi-account monitoring
Same-browser multi-session Login state Parallel tasks, one account
Privacy mode Nothing (fresh each time) One-off scraping, anonymity

When done:

browser-act session close check-trending
browser-act session close check-topics
Enter fullscreen mode Exit fullscreen mode
session_name=check-trending closed=true
session_name=check-topics closed=true
Enter fullscreen mode Exit fullscreen mode

Sessions auto-reclaim after 8 hours if you forget. But clean up after yourself.


Step 7: Skill Forge Make It Reusable

Let's say Step 6 worked great. I want to run that "check my GitHub morning routine" every day without the agent figuring it out from scratch each time.

First, install Skill Forge:

npx skills add browser-act/skills --skill browser-act-skill-forge --yes
Enter fullscreen mode Exit fullscreen mode

Output:

◇  Installation complete

│  ✓ ~/.agents/skills/browser-act-skill-forge
│    universal: Amp, Antigravity, Antigravity CLI, Cline, Codex +12 more
Enter fullscreen mode Exit fullscreen mode

Then tell your agent:

"Forge a Skill that checks GitHub notifications, extracts unread count and top 5 notification titles."

Skill Forge will:

  1. Explore the page structure
  2. Discover the best extraction path
  3. Generate a reusable Skill file
  4. Test it

Next time, the agent just runs the Skill. No re-exploration, no token waste. Same stable path every time.

Where I'd use this:

  • Daily dashboard checks
  • Competitor price monitoring
  • Pull request summary generation
  • CI/CD status aggregation

Where I wouldn't:

  • Sites that change layout constantly
  • One-off tasks that'll never repeat
  • Anything that needs real-time interaction (chat, live support)

Bonus: How I'd Use This in Production

I ran a few extra tests to see how BrowserAct fits into a real cloud operations workflow. Here's what I found.

Monitoring AWS Health Dashboard

browser-act --session aws-health browser open 99703194156616493 https://status.aws.amazon.com
Enter fullscreen mode Exit fullscreen mode


session_name=aws-health
browser_type=stealth
url=https://health.aws.amazon.com/health/status
title=health.aws.amazon.com/health/status
Enter fullscreen mode Exit fullscreen mode

Then extract the status:

browser-act --session aws-health get markdown
Enter fullscreen mode Exit fullscreen mode


AWS Health Dashboard
====================
Service health - Jun 09, 2026
Enter fullscreen mode Exit fullscreen mode

Take a screenshot for your Slack channel:

browser-act --session aws-health screenshot /tmp/aws-health.png
Enter fullscreen mode Exit fullscreen mode


saved: /tmp/aws-health.png
Enter fullscreen mode Exit fullscreen mode

Real screenshot, 315KB, exactly what the dashboard looks like. Ship that to a Slack webhook and your team gets a visual status check every morning.


Checking GitHub Status (CI Monitoring)

browser-act --session aws-health navigate https://www.githubstatus.com
browser-act --session aws-health get markdown
Enter fullscreen mode Exit fullscreen mode


All Systems Operational
-----------------------
Git Operations   99.85% uptime  Normal
Webhooks         99.96% uptime  Normal
API Requests     99.99% uptime  Normal
Issues           99.97% uptime  Normal
Pull Requests    99.61% uptime  Normal
Enter fullscreen mode Exit fullscreen mode

Now my agent has real uptime data it can act on. If Pull Requests drops below 99%, alert me.


Browsing Cloudflare's Own Product Page (Their Anti-Bot)

The ultimate test can BrowserAct get through Cloudflare's own website?

browser-act --session aws-health navigate https://www.cloudflare.com/products/
browser-act --session aws-health get markdown
Enter fullscreen mode Exit fullscreen mode


Cloudflare Products
-------------------
Everything you need to build, deploy, and scale applications...
Workers Global serverless functions
D1 - Serverless SQL database
R2 - Object storage...
Enter fullscreen mode Exit fullscreen mode

Yes. It extracted their full product catalog through their own protection. That's something.


JavaScript Evaluation (Custom Data Extraction)

Need something specific that the markdown extraction doesn't give you? Run JavaScript directly:

browser-act --session aws-health eval "document.title"
Enter fullscreen mode Exit fullscreen mode


Products | Cloudflare
Enter fullscreen mode Exit fullscreen mode
browser-act --session aws-health eval "document.querySelectorAll('a').length"
Enter fullscreen mode Exit fullscreen mode


122
Enter fullscreen mode Exit fullscreen mode

This lets you write precise extraction logic without relying on the markdown parser.


My Production Architecture (What I'd Actually Build)

Here's the setup I'm planning for my AWS monitoring workflow:

┌────────────────────────────────────────────────────────────────┐
│                    Cron Job (Every Morning 8 AM)                │
└──────────────────────────┬─────────────────────────────────────┘
                           │
                           ▼
┌────────────────────────────────────────────────────────────────┐
│                   AI Agent (Kiro / Claude Code)                 │
│                                                                │
│  1. browser-act --session grafana browser open <id> <url>      │
│  2. browser-act --session grafana get markdown                 │
│  3. browser-act --session grafana screenshot ./grafana.png     │
│  4. browser-act --session cloudwatch browser open <id> <url>   │
│  5. browser-act --session cloudwatch get markdown              │
│  6. Parse data → Generate summary                             │
│  7. Post to Slack with screenshots                            │
│  8. Close all sessions                                         │
└──────────────────────────┬─────────────────────────────────────┘
                           │
                           ▼
┌────────────────────────────────────────────────────────────────┐
│                 Slack Channel: #morning-status                  │
│                                                                │
│  Morning Infra Report - Jun 04, 2026                           │
│  All AWS services operational                                  │
│  GitHub: 99.85% uptime                                         │
│  Grafana: CPU alert on prod-api-3                              │
│  [dashboard-screenshot.png]                                    │
└────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Why this works better than API-only monitoring:

  • Some dashboards don't expose APIs (internal Grafana, vendor portals)
  • Screenshots give visual context that JSON data can't
  • Human handoff means SSO re-auth doesn't break the whole pipeline
  • One browser session stays logged in no re-auth every day

Production considerations:

  • Run on a dedicated EC2 instance (t3.medium is enough tested)
  • Use static proxy for stable identity (avoids triggering "new device" alerts)
  • Set --desc on browsers with site/account info so future sessions know what's what
  • Monitor credit usage each session costs credits, budget accordingly
  • Store screenshots in S3, link in Slack messages
  • If login expires, remote-assist lets you re-auth from your phone without SSH-ing in

Before vs After: How This Changed My Morning

Let me be real about what my daily routine looked like before and after testing BrowserAct.

Before BrowserAct (My Old Morning)

8:00 AM - Open laptop
8:02 AM - Open GitHub, hit MFA prompt on phone, wait
8:04 AM - Check notifications (12 unread), scan PRs (3 need review)
8:08 AM - Open AWS Console, MFA again, navigate to CloudWatch
8:12 AM - Check 3 dashboards across 2 accounts, screenshot for team
8:18 AM - Open Grafana, login, scroll through panels looking for alerts
8:23 AM - Check GitHub Actions 2 repos have failing builds
8:27 AM - Open competitor's changelog page (Cloudflare protected)
8:30 AM - Copy-paste a summary into Slack for the team
8:35 AM - Realize I missed one account, go back to AWS...
8:40 AM - Actually start working
Enter fullscreen mode Exit fullscreen mode

40 minutes of monkey work. Every morning. On good days.

On bad days a session expires mid-check, or MFA doesn't arrive, or Cloudflare blocks my automation script. Then it's 50+ minutes.

After BrowserAct (What I'm Building Now)

8:00 AM - Agent runs automatically (cron trigger)

Agent:
  → Opens stealth session on GitHub
  → Extracts: 12 notifications, 3 PRs pending review
  → Navigates to GitHub Actions: 2 failing builds (api-server, docs-deploy)
  → Opens parallel session: AWS Health Dashboard
  → Extracts: "All Systems Operational"
  → Screenshots CloudWatch dashboard → saves to S3
  → Navigates to competitor changelog: extracts new features list
  → Posts summary to Slack #morning-status

8:01 AM - Slack notification pops up with the full morning report.

8:02 AM - I read the summary with my coffee. Actually start working.
Enter fullscreen mode Exit fullscreen mode

Time saved: ~38 minutes/day. That's 3+ hours/week.

And the thing is it's not just the time. It's the mental context switching. Opening six different login-protected dashboards pulls you out of focus. Having a one-page summary waiting in Slack means I start my day knowing exactly what needs attention.


The Honest Part

It's not fully there yet. Here's my current reality:

  • What works today: The extraction, screenshots, and parallel sessions all work as shown in this article. I tested every command above on a real EC2 instance.
  • What I'm still setting up: The cron automation + Slack integration pipeline. That's a week of wiring. I'll write a follow-up when it's running.
  • The one catch: First login to each site needs me to remote-assist from my phone (handle MFA manually). After that, the session stays authenticated for days. So it's "almost" fully automated I handle MFA once a week, the agent handles the other 4 days.

Where I Wouldn't Use This

Being honest not everything needs browser automation:

  • AWS resource monitoring - Use CloudWatch alarms + SNS. APIs exist. Don't browser-scrape what you can API-call.
  • Simple uptime checks - Use UptimeRobot or similar. Don't overkill it.
  • Data that changes every second - BrowserAct isn't real-time. It's periodic checks.
  • Banking or highly sensitive portals - Too risky. Keep that manual.

Where It Shines

  • Dashboards without APIs - Grafana (free tier), internal tools, vendor portals
  • Multi-account visual checks - AWS Console across 3 accounts, screenshot each
  • Competitor monitoring - Product pages behind Cloudflare, pricing pages that block scrapers
  • CI/CD status aggregation - Pull data from GitHub Actions, CircleCI, Jenkins (web UI) into one summary
  • Any "check and report" workflow that you do manually more than twice a week

What Worked and What Didn't

Worked Well

  • Stealth browser sessions - get through Cloudflare where Puppeteer/Selenium can't. Tested on nowsecure.nl and cloudflare.com itself both passed.
  • Real dashboard extraction - pulled structured data from AWS Health Dashboard, GitHub Status, and Cloudflare product pages. Real production use cases, all worked.
  • Screenshots - screenshot command saves PNGs directly. 315KB for a full-page capture. Great for visual monitoring and Slack reports.
  • Indexed interaction - the state/click/input model is clean. Way better than parsing DOM. Real element indices, real clicks.
  • Human handoff - generates a live URL instantly. This solves a real problem I've had for years.
  • Session isolation - ran two parallel sessions on the same browser, completely independent. No conflicts.
  • Navigation within sessions - navigate command lets you move across sites within one session. Open AWS Health, then navigate to GitHub Status, same session.
  • JavaScript eval - eval lets you run custom extraction when markdown isn't precise enough.
  • Network capture - network requests shows exactly what's happening under the hood. Great for debugging.
  • Install was clean - uv tool install pulled 90 packages, had it running in under a minute on a t3.medium.

Mixed

  • stealth-extract vs full sessions - stealth-extract works for quick reads on normal sites, but for heavily protected sites you need a full stealth browser session. Not obvious from the docs.
  • CAPTCHA solving - solve-captcha returned "no compatible captcha found" when I tested (page had already loaded past it). Couldn't trigger a scenario where it was needed during my testing.
  • Speed - stealth browsing is slower than raw Puppeteer. Takes a few seconds to open sessions. Makes sense (it's doing more work) but worth noting.

Could Be Better

  • Documentation - the get-skills command dumps a massive guide. Useful but overwhelming on first read.
  • Error messages - "Browser launch failed: Connection closed" doesn't tell you much. Took trial and error to figure out I needed a full session for protected sites.
  • Requires API key for stealth - can't use the anti-detection features without signing up first. Fair, but the free tier is limited.

Pricing & Credits

BrowserAct runs on a credit system:

  • Free credits: 100 credits on signup, 500 for starring their GitHub repo
  • Free trial: 7 days with credits on any subscription
  • Paid plans: Check browseract.com/pricing for current rates

For testing and writing this article, the free credits were enough. For production monitoring workflows running daily, you'd need a paid plan.


Troubleshooting

Issue 1: "browser-act: command not found"

bash: browser-act: command not found
Enter fullscreen mode Exit fullscreen mode

Fix: Install via uv (not npm):

curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install browser-act-cli --python 3.12
Enter fullscreen mode Exit fullscreen mode

The binary installs to ~/.local/bin/ make sure it's in your PATH.


Issue 2: "Browser launch failed: Connection closed"

Error: Browser launch failed: Browser.close: Connection closed while reading from the driver
Enter fullscreen mode Exit fullscreen mode

Fix: This happens when stealth-extract can't handle a heavily protected site. Use a full browser session instead:

browser-act browser create --name "my-stealth" --type stealth --desc "browsing"
browser-act --session task1 browser open <browser-id> <url>
Enter fullscreen mode Exit fullscreen mode

Issue 3: "api_key: not configured"

CLI:
  api_key: not configured
Enter fullscreen mode Exit fullscreen mode

Fix: You need an API key for stealth browser features:

browser-act auth set <your-api-key>
Enter fullscreen mode Exit fullscreen mode

Get one at browseract.com.


Issue 4: Chrome not found

Error: Chrome executable not found
Enter fullscreen mode Exit fullscreen mode

Fix: Install Chrome:

# Ubuntu
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

# Amazon Linux
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm
Enter fullscreen mode Exit fullscreen mode

Issue 5: Session already exists

Error: Session name 'my-task' already in use
Enter fullscreen mode Exit fullscreen mode

Fix: Close it first:

browser-act session close my-task
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

As a cloud architect, here's where I see this fitting:

1. Infrastructure Monitoring

  • Check multiple dashboards (Grafana, CloudWatch, Datadog) through a single agent
  • Generate daily summaries from web-based monitoring tools
  • Alert on visual changes in dashboards that don't have API access

2. DevOps Workflows

  • Automate PR review summaries across repos
  • Check CI/CD status across multiple platforms
  • Monitor deployment pipelines with login-protected UIs

3. Multi-Account Operations

  • Manage multiple AWS accounts through Console (when CLI isn't enough)
  • Monitor multiple SaaS dashboards
  • Cross-account compliance checks on web portals

4. Research & Data Collection

  • Track competitor features and pricing pages
  • Aggregate release notes from multiple vendor sites
  • Collect public data from protected listing pages

Key Concepts Learned

1. Sessions = Task Workspaces

browser-act --session <name> <command>
Enter fullscreen mode Exit fullscreen mode

Every task gets its own session. Sessions don't interfere. Name them descriptively.

2. Browsers = Identities

Different browsers = different fingerprints, proxies, cookies. Use separate browsers for separate accounts. Three types:

  • chrome - imports your local Chrome login state
  • chrome-direct - controls your running Chrome directly
  • stealth - anti-detection browser with fingerprint masking (needs API key)

3. Skills = Reusable Workflows

Once something works, package it as a Skill. Next time it runs without re-exploration.

4. Three-Layer Anti-Blocking

Environment (fingerprint) → Execution (auto-solve) → Human (handoff). Progressive escalation.

5. Two Extraction Modes

  • stealth-extract - quick, zero-config, good for simple reads. Lightweight.
  • Full browser session - persistent, stronger anti-detection, needed for heavily protected sites.

What's Next?

I'm building the full pipeline I described above. Here's my roadmap:

Week 1 (done): Test BrowserAct on real sites this article
Week 2: Wire up the morning monitoring agent (cron + BrowserAct + Slack webhook)
Week 3: Add Skill Forge package the workflow so it's stable across page layout changes
Week 4: Run for 30 days, track: time saved, credits consumed, failures, manual interventions

If there's interest, I'll write a follow-up with:

  • Actual production usage data (cost per month, reliability %)
  • The Skill file I built for GitHub monitoring
  • How many times remote-assist saved me from a broken pipeline
  • Token usage comparison vs doing the same thing with raw Playwright

The goal is simple: I want my mornings back. 40 minutes of tab-switching replaced by a 2-minute Slack read. BrowserAct is the missing piece between "my agent can code" and "my agent can actually see what's happening in production."


Resources

BrowserAct:

My Previous Article:


Summary

Here's what I found after testing BrowserAct on an EC2 instance (Amazon Linux, t3.medium):

  • Stealth browser sessions get through Cloudflare tested on nowsecure.nl and cloudflare.com itself
  • Extracted real data from AWS Health Dashboard, GitHub Status actual production monitoring working
  • Screenshots save as PNG ready to pipe to Slack/S3 for visual dashboards
  • Human handoff generates a live remote URL in seconds actually works
  • Parallel sessions run independently on the same browser no conflicts
  • Navigation + eval let you build complex multi-step extraction workflows
  • Indexed interaction (state, click, input) is agent-friendly and token-efficient
  • Skill Forge makes repeated workflows reusable
  • stealth-extract has limits use full sessions for protected sites
  • Needs API key for anti-detection features
  • Error messages could be clearer when things fail

Bottom line: If you're running AI agents and they need to interact with real websites not just APIs this is worth testing. I'm already planning a daily monitoring pipeline with it. It's not magic. It won't bypass everything. But it solves real problems that I haven't seen other tools handle this cleanly.


Connect & Share

If this was useful:

  • Star the BrowserAct GitHub repo
  • Drop a comment what workflows would you automate?
  • Share with your team
  • Follow me for the follow-up article

📌 Wrapping Up

Thanks for reading! If this was helpful:

  • ❤️ Like if it added value
  • 💾 Save for later
  • 🔄 Share with your team

Follow me for more on: AWS architecture, FinOps, DevOps, and AI Infrastructure.

👉 Visit my website | Connect on LinkedIn | Email: simplynadaf@gmail.com

Happy Learning 🚀

Top comments (0)