👋 Hey there, Tech Enthusiasts!
I'm Sarvar, a Cloud Architect who loves turning complex tech problems into simple solutions. I've worked with AWS, Azure, DevOps, Data, Analytics, Generative-AI and Agentic-AI building real systems for real companies. In this article series, I'll share what I've learned in a way that's easy to follow, whether you're experienced or just getting started.
Let's get into it! 🚀
"Your agent can write Terraform, deploy infrastructure, and debug pipelines. But ask it to check a dashboard behind Cloudflare? It's useless."
If you read my previous article I Let an AI Agent Become My DevOps Engineer you know I've been pushing AI agents into real operational workflows. Code generation, CI/CD pipelines, infrastructure provisioning agents handle all of that now.
But there's one area where every agent I've used just... stops working. The browser.
Here's the reality: Most of the tools I monitor daily Grafana dashboards, AWS Console, CI pipelines, internal wikis live behind login walls, CAPTCHAs, and anti-bot protection. My agents can't touch them.
That's where BrowserAct comes in.
I found this tool that claims to give AI agents real browser control not headless puppeteer scripts that get blocked instantly, but actual anti-detection browsing with CAPTCHA handling and human handoff built in.
I spent a week testing it. Here's what I found.
By the end of this article, you'll:
- Understand why AI agents fail at real browser tasks
- Install BrowserAct and run your first extraction
- See how anti-detection browsing actually works
- Test human handoff for 2FA/login scenarios
- Run parallel browser sessions without cross-contamination
- Turn a repeated workflow into a reusable Skill
Time Required: 30 minutes (15 min read + 15 min hands-on)
Difficulty: Intermediate
Prerequisites: Python 3.12+, Node.js 18+, Google Chrome, terminal access
The Problem: Agents Can't Browse the Real Web
The Reality of AI Agent Automation in 2026
I manage infrastructure across multiple AWS accounts. Every morning I check:
- CloudWatch dashboards
- GitHub notifications and PR reviews
- CI/CD pipeline status
- Internal monitoring tools
- Competitor product pages (for research)
That's easily 30-40 minutes of tab-switching, scrolling, and context-gathering before I even start real work.
I thought my agent can write Terraform and deploy entire VPCs. Surely it can open a webpage and read some data?
Nope.
Attempt 1: Basic web fetch
curl https://monitoring-dashboard.internal.com
403 Forbidden - Access Denied
Attempt 2: Headless Puppeteer
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://dashboard.example.com');
// Blocked by Cloudflare challenge
Error: Page stuck on verification challenge
Attempt 3: Selenium
Same story. Detected as bot within 2 seconds.
Problems:
- Anti-bot systems detect headless browsers instantly
- CAPTCHAs stop automation cold
- 2FA/login flows need a human but there's no way to hand off
- Running multiple accounts in one browser gets all of them flagged
- Every script breaks when the site changes layout
My daily routine before BrowserAct:
- Open 6 tabs manually
- Log into GitHub, AWS Console, Grafana MFA for each
- Scroll through notifications, check pipeline status
- Copy-paste data into Slack for the team
- Repeat next morning
40 minutes. Every. Day.
Sound familiar? I needed something built specifically for this a browser that agents can actually use on the real web.
What is BrowserAct?
Simple Version
BrowserAct is a CLI that gives your AI agent a real Chrome browser with anti-detection, CAPTCHA solving, and human handoff built in.
Think of it like this:
- Puppeteer/Playwright = giving your agent a browser that screams "I'M A BOT"
- BrowserAct = giving your agent a browser that looks and behaves like a real person
The Five Things It Actually Does
Gets past anti-bot walls: Real fingerprints, proxy rotation, stealth browsing. Sites don't know it's automated.
Handles CAPTCHAs automatically: Cloudflare, DataDome, reCAPTCHA. Solves what it can, escalates what it can't.
Hands off to humans when stuck: Hit a QR code login? SMS verification? It generates a URL. You (or a teammate) open it on your phone, do the human step, and the agent continues from where it stopped.
Runs parallel tasks without interference: Three sessions checking three different things under the same account. They don't step on each other.
Turns workflows into reusable Skills: Did something once? Package it. Run it again without the agent having to figure it out from scratch.
Prerequisites
Before starting, make sure you have:
- Python 3.12+ (BrowserAct CLI is Python-based)
- Node.js 18+ (for
npx skillsSkill installation) -
uvpackage manager (orpip) - Google Chrome installed
- Terminal/CLI access
- An AI agent that can run shell commands (Claude Code, Cursor, Kiro, Codex or just run commands yourself)
Quick Check
python3 --version
# Python 3.12+
node --version
# v18+
google-chrome --version
# Google Chrome 149.x.x.x
If you don't have uv (fast Python package manager):
curl -LsSf https://astral.sh/uv/install.sh | sh
If you don't have Chrome:
# Ubuntu/Debian
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb
# Amazon Linux
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm
What We're Building
Here's what we're going to test today:
┌─────────────────────────────────────────────────┐
│ AI Agent (You/Claude/Cursor) │
└─────────────────────┬───────────────────────────┘
│ CLI commands
▼
┌─────────────────────────────────────────────────┐
│ BrowserAct CLI │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Stealth │ │ Sessions │ │ Human Handoff │ │
│ │ Browser │ │ Manager │ │ (remote-assist) │ │
│ └──────────┘ └──────────┘ └──────────────────┘ │
└─────────────────────┬───────────────────────────┘
│ Real Chrome
▼
┌─────────────────────────────────────────────────┐
│ Protected Websites (Cloudflare, Login Walls) │
└─────────────────────────────────────────────────┘
Step 1: Install BrowserAct
Two parts here the Skill definition (for AI agents) and the actual CLI.
Install the Skill (for agent integration):
npx skills add browser-act/skills --skill browser-act --yes
Output:
◇ Installation complete
│ ✓ ~/.agents/skills/browser-act
│ universal: Amp, Antigravity, Antigravity CLI, Cline, Codex +12 more
Install the CLI itself:
uv tool install browser-act-cli --python 3.12
Output:
Resolved 90 packages in 437ms
Installed 90 packages in 2.09s
Installed 1 executable: browser-act
Verify it works (required on first install):
browser-act get-skills core --skill-version 2.0.2
This returns the environment state, available browsers, and command reference. Run this before anything else it completes the version handshake.
How to Get Your BrowserAct API Key
Go to BrowserAct and sign in to your account.
Click on your profile email address in the top-right corner.
From the dropdown menu, select API Keys.
Click Manage Keys.
Select Create Key.
Enter a name for your API key (for example,
Amazon-Q,MCP-Server, orDevelopment).Click Create.
Copy the generated API key and store it securely. You may not be able to view the full key again after leaving the page.
Note: Treat your API key like a password. Do not share it publicly or commit it to source code repositories.
Set your API key
browser-act auth set <your-api-key>
API key saved.
Step 2: Your First Extraction (Zero Config)
The simplest thing BrowserAct can do extract content from a page without any setup:
browser-act stealth-extract https://httpbin.org/ip
Output from my test:
{
"origin": "100.54.212.44"
}
Clean, rendered content. No HTML tags, no noise. Just the data.
Let's try something with actual content:
browser-act stealth-extract https://example.com
Output:
# Example Domain
This domain is for use in documentation examples without needing permission. Avoid use in operations.
[Learn more](https://iana.org/domains/example)
Already in markdown format. My agent can read this directly without any parsing.
What stealth-extract does under the hood:
- Spins up a lightweight stealth browser
- Visits the URL with anti-detection fingerprint
- Renders JavaScript (unlike curl)
- Returns content in markdown
- Tears down the browser
Important note from testing: stealth-extract works great for quick reads on most sites. For heavily protected sites (like nowsecure.nl with aggressive Cloudflare), you'll need a full browser session (Step 3) it has stronger anti-detection because it maintains a persistent fingerprint and proxy.
Step 3: Full Browser Control
Now let's do something more interesting interactive browser automation. First, you need a stealth browser:
browser-act browser create --name "test-stealth" --type stealth --desc "Testing for article"
Output:
id=99703194156616493 name="test-stealth" type=stealth
desc="Testing for article"
Now open a session:
browser-act --session my-research browser open 99703194156616493 https://github.com/trending
Output:
session_name=my-research
browser_type=stealth
url=https://github.com/trending
title=github.com/trending
See what's on the page (indexed elements):
browser-act --session my-research state
Real output from my test:
url=https://github.com/trending
title=Trending repositories on GitHub today · GitHub
|SCROLL|<html class=js-focus-visible /> (0.0 pages above, 1.2 pages below)
[1]<a aria-label=Homepage />
[3]<button type=button aria-expanded=false />
Platform
[4]<button type=button aria-expanded=false />
Solutions
[8]<a class=NavLink-module__link__EG3d4 />
Pricing
[9]<qbsearch-input class=search-input />
[10]<div class=search-input-container search-with-dialog />
[12]<a />
Sign in
[15]<a class=js-selected-navigation-item />
Explore
[17]<a class=js-selected-navigation-item selected />
Trending
See those numbers? That's how the agent interacts. No DOM parsing, no CSS selectors. Just:
# Click the search input (element 10)
browser-act --session my-research click 10
clicked=10
The page updates search box opens. Now type:
browser-act --session my-research input 10 "browser automation AI"
input="browser automation AI" element=10
This is what they mean by "designed for agent reasoning." The output is compact, indexed, and token-efficient. My agent doesn't waste tokens parsing HTML.
You can also grab the full page as markdown:
browser-act --session my-research get markdown
Returns the entire page content in clean markdown format. Useful when you want to extract data rather than interact.
When you're done:
browser-act session close my-research
session_name=my-research closed=true
Step 4: Anti-Bot in Action
Here's where it gets real. I pointed BrowserAct at nowsecure.nl a site specifically designed to test anti-bot detection. It runs Cloudflare challenges.
# Full browser session on a Cloudflare-protected site
browser-act --session antibot browser open 99703194156616493 https://nowsecure.nl
Output:
session_name=antibot
browser_type=stealth
url=https://nowsecure.nl/
title=nowsecure.nl
It got through. Let me verify by pulling the content:
browser-act --session antibot get markdown
nowsecure.nl
NOWSECURE
---------
### by nodriver
And checking the network traffic shows exactly what happened Cloudflare's turnstile challenge was handled automatically:
browser-act --session antibot network requests
# format: csv
...
GET,200,Script,application/javascript,...,https://challenges.cloudflare.com/turnstile/v0/g/8fc8ed1d8752/api.js
GET,200,Document,text/html,...,https://challenges.cloudflare.com/cdn-cgi/challenge-platform/h/g/turnstile/...
The stealth browser handled the Cloudflare verification without any manual intervention. No CAPTCHA prompt, no block.
BrowserAct uses three layers to get through:
- Environment layer - Stealth fingerprint, TLS config, proxy switching. Most blocks never trigger.
-
Execution layer - If a CAPTCHA appears,
solve-captchahandles it automatically. - Human layer - If auto-solve fails, it can hand off to you (more on this next).
I'm not going to claim it works on every site. It doesn't. No tool does. But for the monitoring dashboards and research pages I tested, it got through where Puppeteer and Selenium couldn't.
Step 5: Human Handoff (This Is the Killer Feature)
This is the part that sold me. Here's the scenario:
Your agent is automating a workflow. It hits a login page that requires SMS verification or QR code scan. In Puppeteer world, the automation just dies. Game over.
BrowserAct does something different:
browser-act --session login-task remote-assist --objective "complete 2FA verification"
Real output from my test:
Remote assist session created.
Share this URL with the user:
https://www.browseract.com/remote-cli/d83544c39e4a4e6ba1cc98f95050e615
expires in 1h 0m
Human assist is now active - the browser is under user control.
Do not send browser commands until the user finishes the assist session.
You open that URL on your phone or another device. You see the browser the actual live browser state. You complete the SMS verification, scan the QR code, whatever. Then you close it.
The agent gets notified and continues from the exact same browser state.
Why this matters for DevOps:
- Internal tools with SSO that requires periodic re-auth
- AWS Console with MFA
- Third-party dashboards with 2FA
- Any workflow where "fully automated" isn't realistic
The agent doesn't crash. It doesn't restart. It waits, you help, it continues. That's practical automation.
Step 6: Parallel Sessions (Multi-Task Without Conflicts)
Here's a real use case from my work: I want an agent to check three things simultaneously under the same account.
# Session 1: Check GitHub trending
browser-act --session check-trending browser open 99703194156616493 https://github.com/trending
session_name=check-trending
browser_type=stealth
url=https://github.com/trending
title=github.com/trending
# Session 2: Check GitHub topics (same browser, parallel session)
browser-act --session check-topics browser open 99703194156616493 https://github.com/topics
session_name=check-topics
browser_type=stealth
url=https://github.com/topics
title=github.com/topics
Two sessions. One browser. They don't interfere with each other.
# See all active sessions
browser-act session list
session_name: check-trending
browser_type: stealth
browser_id: 99703194156616493
title: Trending repositories on GitHub today · GitHub
url: https://github.com/trending
session_name: check-topics
browser_type: stealth
browser_id: 99703194156616493
title: Topics on GitHub · GitHub
url: https://github.com/topics
Each session has its own navigation state but shares the login cookies. So you log in once, and all tasks can work in parallel.
The three concurrency models:
| Model | What's shared | Use case |
|---|---|---|
| Cross-browser parallel | Nothing (independent identity) | Multi-account monitoring |
| Same-browser multi-session | Login state | Parallel tasks, one account |
| Privacy mode | Nothing (fresh each time) | One-off scraping, anonymity |
When done:
browser-act session close check-trending
browser-act session close check-topics
session_name=check-trending closed=true
session_name=check-topics closed=true
Sessions auto-reclaim after 8 hours if you forget. But clean up after yourself.
Step 7: Skill Forge Make It Reusable
Let's say Step 6 worked great. I want to run that "check my GitHub morning routine" every day without the agent figuring it out from scratch each time.
First, install Skill Forge:
npx skills add browser-act/skills --skill browser-act-skill-forge --yes
Output:
◇ Installation complete
│ ✓ ~/.agents/skills/browser-act-skill-forge
│ universal: Amp, Antigravity, Antigravity CLI, Cline, Codex +12 more
Then tell your agent:
"Forge a Skill that checks GitHub notifications, extracts unread count and top 5 notification titles."
Skill Forge will:
- Explore the page structure
- Discover the best extraction path
- Generate a reusable Skill file
- Test it
Next time, the agent just runs the Skill. No re-exploration, no token waste. Same stable path every time.
Where I'd use this:
- Daily dashboard checks
- Competitor price monitoring
- Pull request summary generation
- CI/CD status aggregation
Where I wouldn't:
- Sites that change layout constantly
- One-off tasks that'll never repeat
- Anything that needs real-time interaction (chat, live support)
Bonus: How I'd Use This in Production
I ran a few extra tests to see how BrowserAct fits into a real cloud operations workflow. Here's what I found.
Monitoring AWS Health Dashboard
browser-act --session aws-health browser open 99703194156616493 https://status.aws.amazon.com
session_name=aws-health
browser_type=stealth
url=https://health.aws.amazon.com/health/status
title=health.aws.amazon.com/health/status
Then extract the status:
browser-act --session aws-health get markdown
AWS Health Dashboard
====================
Service health - Jun 09, 2026
Take a screenshot for your Slack channel:
browser-act --session aws-health screenshot /tmp/aws-health.png
saved: /tmp/aws-health.png
Real screenshot, 315KB, exactly what the dashboard looks like. Ship that to a Slack webhook and your team gets a visual status check every morning.
Checking GitHub Status (CI Monitoring)
browser-act --session aws-health navigate https://www.githubstatus.com
browser-act --session aws-health get markdown
All Systems Operational
-----------------------
Git Operations 99.85% uptime Normal
Webhooks 99.96% uptime Normal
API Requests 99.99% uptime Normal
Issues 99.97% uptime Normal
Pull Requests 99.61% uptime Normal
Now my agent has real uptime data it can act on. If Pull Requests drops below 99%, alert me.
Browsing Cloudflare's Own Product Page (Their Anti-Bot)
The ultimate test can BrowserAct get through Cloudflare's own website?
browser-act --session aws-health navigate https://www.cloudflare.com/products/
browser-act --session aws-health get markdown
Cloudflare Products
-------------------
Everything you need to build, deploy, and scale applications...
Workers Global serverless functions
D1 - Serverless SQL database
R2 - Object storage...
Yes. It extracted their full product catalog through their own protection. That's something.
JavaScript Evaluation (Custom Data Extraction)
Need something specific that the markdown extraction doesn't give you? Run JavaScript directly:
browser-act --session aws-health eval "document.title"
Products | Cloudflare
browser-act --session aws-health eval "document.querySelectorAll('a').length"
122
This lets you write precise extraction logic without relying on the markdown parser.
My Production Architecture (What I'd Actually Build)
Here's the setup I'm planning for my AWS monitoring workflow:
┌────────────────────────────────────────────────────────────────┐
│ Cron Job (Every Morning 8 AM) │
└──────────────────────────┬─────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────┐
│ AI Agent (Kiro / Claude Code) │
│ │
│ 1. browser-act --session grafana browser open <id> <url> │
│ 2. browser-act --session grafana get markdown │
│ 3. browser-act --session grafana screenshot ./grafana.png │
│ 4. browser-act --session cloudwatch browser open <id> <url> │
│ 5. browser-act --session cloudwatch get markdown │
│ 6. Parse data → Generate summary │
│ 7. Post to Slack with screenshots │
│ 8. Close all sessions │
└──────────────────────────┬─────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────┐
│ Slack Channel: #morning-status │
│ │
│ Morning Infra Report - Jun 04, 2026 │
│ All AWS services operational │
│ GitHub: 99.85% uptime │
│ Grafana: CPU alert on prod-api-3 │
│ [dashboard-screenshot.png] │
└────────────────────────────────────────────────────────────────┘
Why this works better than API-only monitoring:
- Some dashboards don't expose APIs (internal Grafana, vendor portals)
- Screenshots give visual context that JSON data can't
- Human handoff means SSO re-auth doesn't break the whole pipeline
- One browser session stays logged in no re-auth every day
Production considerations:
- Run on a dedicated EC2 instance (t3.medium is enough tested)
- Use static proxy for stable identity (avoids triggering "new device" alerts)
- Set
--descon browsers with site/account info so future sessions know what's what - Monitor credit usage each session costs credits, budget accordingly
- Store screenshots in S3, link in Slack messages
- If login expires,
remote-assistlets you re-auth from your phone without SSH-ing in
Before vs After: How This Changed My Morning
Let me be real about what my daily routine looked like before and after testing BrowserAct.
Before BrowserAct (My Old Morning)
8:00 AM - Open laptop
8:02 AM - Open GitHub, hit MFA prompt on phone, wait
8:04 AM - Check notifications (12 unread), scan PRs (3 need review)
8:08 AM - Open AWS Console, MFA again, navigate to CloudWatch
8:12 AM - Check 3 dashboards across 2 accounts, screenshot for team
8:18 AM - Open Grafana, login, scroll through panels looking for alerts
8:23 AM - Check GitHub Actions 2 repos have failing builds
8:27 AM - Open competitor's changelog page (Cloudflare protected)
8:30 AM - Copy-paste a summary into Slack for the team
8:35 AM - Realize I missed one account, go back to AWS...
8:40 AM - Actually start working
40 minutes of monkey work. Every morning. On good days.
On bad days a session expires mid-check, or MFA doesn't arrive, or Cloudflare blocks my automation script. Then it's 50+ minutes.
After BrowserAct (What I'm Building Now)
8:00 AM - Agent runs automatically (cron trigger)
Agent:
→ Opens stealth session on GitHub
→ Extracts: 12 notifications, 3 PRs pending review
→ Navigates to GitHub Actions: 2 failing builds (api-server, docs-deploy)
→ Opens parallel session: AWS Health Dashboard
→ Extracts: "All Systems Operational"
→ Screenshots CloudWatch dashboard → saves to S3
→ Navigates to competitor changelog: extracts new features list
→ Posts summary to Slack #morning-status
8:01 AM - Slack notification pops up with the full morning report.
8:02 AM - I read the summary with my coffee. Actually start working.
Time saved: ~38 minutes/day. That's 3+ hours/week.
And the thing is it's not just the time. It's the mental context switching. Opening six different login-protected dashboards pulls you out of focus. Having a one-page summary waiting in Slack means I start my day knowing exactly what needs attention.
The Honest Part
It's not fully there yet. Here's my current reality:
- What works today: The extraction, screenshots, and parallel sessions all work as shown in this article. I tested every command above on a real EC2 instance.
- What I'm still setting up: The cron automation + Slack integration pipeline. That's a week of wiring. I'll write a follow-up when it's running.
-
The one catch: First login to each site needs me to
remote-assistfrom my phone (handle MFA manually). After that, the session stays authenticated for days. So it's "almost" fully automated I handle MFA once a week, the agent handles the other 4 days.
Where I Wouldn't Use This
Being honest not everything needs browser automation:
- AWS resource monitoring - Use CloudWatch alarms + SNS. APIs exist. Don't browser-scrape what you can API-call.
- Simple uptime checks - Use UptimeRobot or similar. Don't overkill it.
- Data that changes every second - BrowserAct isn't real-time. It's periodic checks.
- Banking or highly sensitive portals - Too risky. Keep that manual.
Where It Shines
- Dashboards without APIs - Grafana (free tier), internal tools, vendor portals
- Multi-account visual checks - AWS Console across 3 accounts, screenshot each
- Competitor monitoring - Product pages behind Cloudflare, pricing pages that block scrapers
- CI/CD status aggregation - Pull data from GitHub Actions, CircleCI, Jenkins (web UI) into one summary
- Any "check and report" workflow that you do manually more than twice a week
What Worked and What Didn't
Worked Well
- Stealth browser sessions - get through Cloudflare where Puppeteer/Selenium can't. Tested on nowsecure.nl and cloudflare.com itself both passed.
- Real dashboard extraction - pulled structured data from AWS Health Dashboard, GitHub Status, and Cloudflare product pages. Real production use cases, all worked.
-
Screenshots -
screenshotcommand saves PNGs directly. 315KB for a full-page capture. Great for visual monitoring and Slack reports. - Indexed interaction - the state/click/input model is clean. Way better than parsing DOM. Real element indices, real clicks.
- Human handoff - generates a live URL instantly. This solves a real problem I've had for years.
- Session isolation - ran two parallel sessions on the same browser, completely independent. No conflicts.
-
Navigation within sessions -
navigatecommand lets you move across sites within one session. Open AWS Health, then navigate to GitHub Status, same session. -
JavaScript eval -
evallets you run custom extraction when markdown isn't precise enough. -
Network capture -
network requestsshows exactly what's happening under the hood. Great for debugging. -
Install was clean -
uv tool installpulled 90 packages, had it running in under a minute on a t3.medium.
Mixed
-
stealth-extractvs full sessions -stealth-extractworks for quick reads on normal sites, but for heavily protected sites you need a full stealth browser session. Not obvious from the docs. -
CAPTCHA solving -
solve-captchareturned "no compatible captcha found" when I tested (page had already loaded past it). Couldn't trigger a scenario where it was needed during my testing. - Speed - stealth browsing is slower than raw Puppeteer. Takes a few seconds to open sessions. Makes sense (it's doing more work) but worth noting.
Could Be Better
-
Documentation - the
get-skillscommand dumps a massive guide. Useful but overwhelming on first read. - Error messages - "Browser launch failed: Connection closed" doesn't tell you much. Took trial and error to figure out I needed a full session for protected sites.
- Requires API key for stealth - can't use the anti-detection features without signing up first. Fair, but the free tier is limited.
Pricing & Credits
BrowserAct runs on a credit system:
- Free credits: 100 credits on signup, 500 for starring their GitHub repo
- Free trial: 7 days with credits on any subscription
- Paid plans: Check browseract.com/pricing for current rates
For testing and writing this article, the free credits were enough. For production monitoring workflows running daily, you'd need a paid plan.
Troubleshooting
Issue 1: "browser-act: command not found"
bash: browser-act: command not found
Fix: Install via uv (not npm):
curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install browser-act-cli --python 3.12
The binary installs to ~/.local/bin/ make sure it's in your PATH.
Issue 2: "Browser launch failed: Connection closed"
Error: Browser launch failed: Browser.close: Connection closed while reading from the driver
Fix: This happens when stealth-extract can't handle a heavily protected site. Use a full browser session instead:
browser-act browser create --name "my-stealth" --type stealth --desc "browsing"
browser-act --session task1 browser open <browser-id> <url>
Issue 3: "api_key: not configured"
CLI:
api_key: not configured
Fix: You need an API key for stealth browser features:
browser-act auth set <your-api-key>
Get one at browseract.com.
Issue 4: Chrome not found
Error: Chrome executable not found
Fix: Install Chrome:
# Ubuntu
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb
# Amazon Linux
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm
Issue 5: Session already exists
Error: Session name 'my-task' already in use
Fix: Close it first:
browser-act session close my-task
Real-World Use Cases
As a cloud architect, here's where I see this fitting:
1. Infrastructure Monitoring
- Check multiple dashboards (Grafana, CloudWatch, Datadog) through a single agent
- Generate daily summaries from web-based monitoring tools
- Alert on visual changes in dashboards that don't have API access
2. DevOps Workflows
- Automate PR review summaries across repos
- Check CI/CD status across multiple platforms
- Monitor deployment pipelines with login-protected UIs
3. Multi-Account Operations
- Manage multiple AWS accounts through Console (when CLI isn't enough)
- Monitor multiple SaaS dashboards
- Cross-account compliance checks on web portals
4. Research & Data Collection
- Track competitor features and pricing pages
- Aggregate release notes from multiple vendor sites
- Collect public data from protected listing pages
Key Concepts Learned
1. Sessions = Task Workspaces
browser-act --session <name> <command>
Every task gets its own session. Sessions don't interfere. Name them descriptively.
2. Browsers = Identities
Different browsers = different fingerprints, proxies, cookies. Use separate browsers for separate accounts. Three types:
- chrome - imports your local Chrome login state
- chrome-direct - controls your running Chrome directly
- stealth - anti-detection browser with fingerprint masking (needs API key)
3. Skills = Reusable Workflows
Once something works, package it as a Skill. Next time it runs without re-exploration.
4. Three-Layer Anti-Blocking
Environment (fingerprint) → Execution (auto-solve) → Human (handoff). Progressive escalation.
5. Two Extraction Modes
-
stealth-extract- quick, zero-config, good for simple reads. Lightweight. - Full browser session - persistent, stronger anti-detection, needed for heavily protected sites.
What's Next?
I'm building the full pipeline I described above. Here's my roadmap:
Week 1 (done): Test BrowserAct on real sites this article
Week 2: Wire up the morning monitoring agent (cron + BrowserAct + Slack webhook)
Week 3: Add Skill Forge package the workflow so it's stable across page layout changes
Week 4: Run for 30 days, track: time saved, credits consumed, failures, manual interventions
If there's interest, I'll write a follow-up with:
- Actual production usage data (cost per month, reliability %)
- The Skill file I built for GitHub monitoring
- How many times
remote-assistsaved me from a broken pipeline - Token usage comparison vs doing the same thing with raw Playwright
The goal is simple: I want my mornings back. 40 minutes of tab-switching replaced by a 2-minute Slack read. BrowserAct is the missing piece between "my agent can code" and "my agent can actually see what's happening in production."
Resources
BrowserAct:
My Previous Article:
Summary
Here's what I found after testing BrowserAct on an EC2 instance (Amazon Linux, t3.medium):
- Stealth browser sessions get through Cloudflare tested on nowsecure.nl and cloudflare.com itself
- Extracted real data from AWS Health Dashboard, GitHub Status actual production monitoring working
- Screenshots save as PNG ready to pipe to Slack/S3 for visual dashboards
- Human handoff generates a live remote URL in seconds actually works
- Parallel sessions run independently on the same browser no conflicts
- Navigation + eval let you build complex multi-step extraction workflows
- Indexed interaction (state, click, input) is agent-friendly and token-efficient
- Skill Forge makes repeated workflows reusable
-
stealth-extracthas limits use full sessions for protected sites - Needs API key for anti-detection features
- Error messages could be clearer when things fail
Bottom line: If you're running AI agents and they need to interact with real websites not just APIs this is worth testing. I'm already planning a daily monitoring pipeline with it. It's not magic. It won't bypass everything. But it solves real problems that I haven't seen other tools handle this cleanly.
Connect & Share
If this was useful:
- Star the BrowserAct GitHub repo
- Drop a comment what workflows would you automate?
- Share with your team
- Follow me for the follow-up article
📌 Wrapping Up
Thanks for reading! If this was helpful:
- ❤️ Like if it added value
- 💾 Save for later
- 🔄 Share with your team
Follow me for more on: AWS architecture, FinOps, DevOps, and AI Infrastructure.
👉 Visit my website | Connect on LinkedIn | Email: simplynadaf@gmail.com
Happy Learning 🚀






























Top comments (0)