Sarvar Nadaf for AWS Community Builders

Posted on Jun 9

Can AI Agents Browse Login-Protected Dashboards? I Tested It

#ai #agents #automation #discuss

👋 Hey there, Tech Enthusiasts!

I'm Sarvar, a Cloud Architect who loves turning complex tech problems into simple solutions. I've worked with AWS, Azure, DevOps, Data, Analytics, Generative-AI and Agentic-AI building real systems for real companies. In this article series, I'll share what I've learned in a way that's easy to follow, whether you're experienced or just getting started.

Let's get into it! 🚀

"Your agent can write Terraform, deploy infrastructure, and debug pipelines. But ask it to check a dashboard behind Cloudflare? It's useless."

If you read my previous article I Let an AI Agent Become My DevOps Engineer you know I've been pushing AI agents into real operational workflows. Code generation, CI/CD pipelines, infrastructure provisioning agents handle all of that now.

But there's one area where every agent I've used just... stops working. The browser.

Here's the reality: Most of the tools I monitor daily Grafana dashboards, AWS Console, CI pipelines, internal wikis live behind login walls, CAPTCHAs, and anti-bot protection. My agents can't touch them.

That's where BrowserAct comes in.

I found this tool that claims to give AI agents real browser control not headless puppeteer scripts that get blocked instantly, but actual anti-detection browsing with CAPTCHA handling and human handoff built in.

I spent a week testing it. Here's what I found.

By the end of this article, you'll:

Understand why AI agents fail at real browser tasks
Install BrowserAct and run your first extraction
See how anti-detection browsing actually works
Test human handoff for 2FA/login scenarios
Run parallel browser sessions without cross-contamination
Turn a repeated workflow into a reusable Skill

Time Required: 30 minutes (15 min read + 15 min hands-on)
Difficulty: Intermediate
Prerequisites: Python 3.12+, Node.js 18+, Google Chrome, terminal access

The Problem: Agents Can't Browse the Real Web

The Reality of AI Agent Automation in 2026

I manage infrastructure across multiple AWS accounts. Every morning I check:

CloudWatch dashboards
GitHub notifications and PR reviews
CI/CD pipeline status
Internal monitoring tools
Competitor product pages (for research)

That's easily 30-40 minutes of tab-switching, scrolling, and context-gathering before I even start real work.

I thought my agent can write Terraform and deploy entire VPCs. Surely it can open a webpage and read some data?

Nope.

Attempt 1: Basic web fetch

curl https://monitoring-dashboard.internal.com

403 Forbidden - Access Denied

Attempt 2: Headless Puppeteer

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://dashboard.example.com');
// Blocked by Cloudflare challenge

Error: Page stuck on verification challenge

Attempt 3: Selenium
Same story. Detected as bot within 2 seconds.

Problems:

Anti-bot systems detect headless browsers instantly
CAPTCHAs stop automation cold
2FA/login flows need a human but there's no way to hand off
Running multiple accounts in one browser gets all of them flagged
Every script breaks when the site changes layout

My daily routine before BrowserAct:

Open 6 tabs manually
Log into GitHub, AWS Console, Grafana MFA for each
Scroll through notifications, check pipeline status
Copy-paste data into Slack for the team
Repeat next morning

40 minutes. Every. Day.

Sound familiar? I needed something built specifically for this a browser that agents can actually use on the real web.

What is BrowserAct?

Simple Version

BrowserAct is a CLI that gives your AI agent a real Chrome browser with anti-detection, CAPTCHA solving, and human handoff built in.

Think of it like this:

Puppeteer/Playwright = giving your agent a browser that screams "I'M A BOT"
BrowserAct = giving your agent a browser that looks and behaves like a real person

The Five Things It Actually Does

Gets past anti-bot walls: Real fingerprints, proxy rotation, stealth browsing. Sites don't know it's automated.
Handles CAPTCHAs automatically: Cloudflare, DataDome, reCAPTCHA. Solves what it can, escalates what it can't.
Hands off to humans when stuck: Hit a QR code login? SMS verification? It generates a URL. You (or a teammate) open it on your phone, do the human step, and the agent continues from where it stopped.
Runs parallel tasks without interference: Three sessions checking three different things under the same account. They don't step on each other.
Turns workflows into reusable Skills: Did something once? Package it. Run it again without the agent having to figure it out from scratch.

Prerequisites

Before starting, make sure you have:

Python 3.12+ (BrowserAct CLI is Python-based)
Node.js 18+ (for npx skills Skill installation)
uv package manager (or pip)
Google Chrome installed
Terminal/CLI access
An AI agent that can run shell commands (Claude Code, Cursor, Kiro, Codex or just run commands yourself)

Quick Check

python3 --version
# Python 3.12+

node --version
# v18+

google-chrome --version
# Google Chrome 149.x.x.x

If you don't have uv (fast Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh

If you don't have Chrome:

# Ubuntu/Debian
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

# Amazon Linux
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm

What We're Building

Here's what we're going to test today:

┌─────────────────────────────────────────────────┐
│                 AI Agent (You/Claude/Cursor)    │
└─────────────────────┬───────────────────────────┘
                      │ CLI commands
                      ▼
┌─────────────────────────────────────────────────┐
│              BrowserAct CLI                     │
│  ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│  │ Stealth  │ │ Sessions │ │  Human Handoff   │ │
│  │ Browser  │ │ Manager  │ │  (remote-assist) │ │
│  └──────────┘ └──────────┘ └──────────────────┘ │
└─────────────────────┬───────────────────────────┘
                      │ Real Chrome
                      ▼
┌─────────────────────────────────────────────────┐
│  Protected Websites (Cloudflare, Login Walls)   │
└─────────────────────────────────────────────────┘

Step 1: Install BrowserAct

Two parts here the Skill definition (for AI agents) and the actual CLI.

Install the Skill (for agent integration):

npx skills add browser-act/skills --skill browser-act --yes

Output:

◇  Installation complete

│  ✓ ~/.agents/skills/browser-act
│    universal: Amp, Antigravity, Antigravity CLI, Cline, Codex +12 more

Install the CLI itself:

uv tool install browser-act-cli --python 3.12

Output:

Resolved 90 packages in 437ms
Installed 90 packages in 2.09s
Installed 1 executable: browser-act

Verify it works (required on first install):

browser-act get-skills core --skill-version 2.0.2

This returns the environment state, available browsers, and command reference. Run this before anything else it completes the version handshake.

How to Get Your BrowserAct API Key

Go to BrowserAct and sign in to your account.
Click on your profile email address in the top-right corner.
From the dropdown menu, select API Keys.
Click Manage Keys.
Select Create Key.
Enter a name for your API key (for example, Amazon-Q, MCP-Server, or Development).
Click Create.
Copy the generated API key and store it securely. You may not be able to view the full key again after leaving the page.

Note: Treat your API key like a password. Do not share it publicly or commit it to source code repositories.

Set your API key

browser-act auth set <your-api-key>

API key saved.

Step 2: Your First Extraction (Zero Config)

The simplest thing BrowserAct can do extract content from a page without any setup:

browser-act stealth-extract https://httpbin.org/ip

Output from my test:

{
  "origin": "100.54.212.44"
}

Clean, rendered content. No HTML tags, no noise. Just the data.

Let's try something with actual content:

browser-act stealth-extract https://example.com

Output:

# Example Domain
This domain is for use in documentation examples without needing permission. Avoid use in operations.
[Learn more](https://iana.org/domains/example)

Already in markdown format. My agent can read this directly without any parsing.

What stealth-extract does under the hood:

Spins up a lightweight stealth browser
Visits the URL with anti-detection fingerprint
Renders JavaScript (unlike curl)
Returns content in markdown
Tears down the browser

Important note from testing: stealth-extract works great for quick reads on most sites. For heavily protected sites (like nowsecure.nl with aggressive Cloudflare), you'll need a full browser session (Step 3) it has stronger anti-detection because it maintains a persistent fingerprint and proxy.

Step 3: Full Browser Control

Now let's do something more interesting interactive browser automation. First, you need a stealth browser:

browser-act browser create --name "test-stealth" --type stealth --desc "Testing for article"

Output:

id=99703194156616493 name="test-stealth" type=stealth
  desc="Testing for article"

Now open a session:

browser-act --session my-research browser open 99703194156616493 https://github.com/trending

Output:

session_name=my-research
browser_type=stealth
url=https://github.com/trending
title=github.com/trending

See what's on the page (indexed elements):

browser-act --session my-research state

Real output from my test:

url=https://github.com/trending
title=Trending repositories on GitHub today · GitHub

|SCROLL|<html class=js-focus-visible /> (0.0 pages above, 1.2 pages below)

  [1]<a aria-label=Homepage />
  [3]<button type=button aria-expanded=false />
      Platform
  [4]<button type=button aria-expanded=false />
      Solutions
  [8]<a class=NavLink-module__link__EG3d4 />
      Pricing
  [9]<qbsearch-input class=search-input />
      [10]<div class=search-input-container search-with-dialog />
  [12]<a />
      Sign in
  [15]<a class=js-selected-navigation-item />
      Explore
  [17]<a class=js-selected-navigation-item selected />
      Trending

See those numbers? That's how the agent interacts. No DOM parsing, no CSS selectors. Just:

# Click the search input (element 10)
browser-act --session my-research click 10

clicked=10

The page updates search box opens. Now type:

browser-act --session my-research input 10 "browser automation AI"

input="browser automation AI" element=10

This is what they mean by "designed for agent reasoning." The output is compact, indexed, and token-efficient. My agent doesn't waste tokens parsing HTML.

You can also grab the full page as markdown:

browser-act --session my-research get markdown

Returns the entire page content in clean markdown format. Useful when you want to extract data rather than interact.

When you're done:

browser-act session close my-research

session_name=my-research closed=true

Step 4: Anti-Bot in Action

Here's where it gets real. I pointed BrowserAct at nowsecure.nl a site specifically designed to test anti-bot detection. It runs Cloudflare challenges.

# Full browser session on a Cloudflare-protected site
browser-act --session antibot browser open 99703194156616493 https://nowsecure.nl

Output:

session_name=antibot
browser_type=stealth
url=https://nowsecure.nl/
title=nowsecure.nl

It got through. Let me verify by pulling the content:

browser-act --session antibot get markdown

nowsecure.nl
NOWSECURE
---------
### by nodriver

And checking the network traffic shows exactly what happened Cloudflare's turnstile challenge was handled automatically:

browser-act --session antibot network requests

# format: csv
...
GET,200,Script,application/javascript,...,https://challenges.cloudflare.com/turnstile/v0/g/8fc8ed1d8752/api.js
GET,200,Document,text/html,...,https://challenges.cloudflare.com/cdn-cgi/challenge-platform/h/g/turnstile/...

The stealth browser handled the Cloudflare verification without any manual intervention. No CAPTCHA prompt, no block.

BrowserAct uses three layers to get through:

Environment layer - Stealth fingerprint, TLS config, proxy switching. Most blocks never trigger.
Execution layer - If a CAPTCHA appears, solve-captcha handles it automatically.
Human layer - If auto-solve fails, it can hand off to you (more on this next).

I'm not going to claim it works on every site. It doesn't. No tool does. But for the monitoring dashboards and research pages I tested, it got through where Puppeteer and Selenium couldn't.

Step 5: Human Handoff (This Is the Killer Feature)

This is the part that sold me. Here's the scenario:

Your agent is automating a workflow. It hits a login page that requires SMS verification or QR code scan. In Puppeteer world, the automation just dies. Game over.

BrowserAct does something different:

browser-act --session login-task remote-assist --objective "complete 2FA verification"

Real output from my test:

Remote assist session created.

Share this URL with the user:
  https://www.browseract.com/remote-cli/d83544c39e4a4e6ba1cc98f95050e615
expires in 1h 0m

Human assist is now active - the browser is under user control.
Do not send browser commands until the user finishes the assist session.

You open that URL on your phone or another device. You see the browser the actual live browser state. You complete the SMS verification, scan the QR code, whatever. Then you close it.

The agent gets notified and continues from the exact same browser state.

Why this matters for DevOps:

Internal tools with SSO that requires periodic re-auth
AWS Console with MFA
Third-party dashboards with 2FA
Any workflow where "fully automated" isn't realistic

The agent doesn't crash. It doesn't restart. It waits, you help, it continues. That's practical automation.

Step 6: Parallel Sessions (Multi-Task Without Conflicts)

Here's a real use case from my work: I want an agent to check three things simultaneously under the same account.

# Session 1: Check GitHub trending
browser-act --session check-trending browser open 99703194156616493 https://github.com/trending

session_name=check-trending
browser_type=stealth
url=https://github.com/trending
title=github.com/trending

# Session 2: Check GitHub topics (same browser, parallel session)
browser-act --session check-topics browser open 99703194156616493 https://github.com/topics

session_name=check-topics
browser_type=stealth
url=https://github.com/topics
title=github.com/topics

Two sessions. One browser. They don't interfere with each other.

# See all active sessions
browser-act session list

session_name: check-trending
browser_type: stealth
browser_id: 99703194156616493
title: Trending repositories on GitHub today · GitHub
url: https://github.com/trending

session_name: check-topics
browser_type: stealth
browser_id: 99703194156616493
title: Topics on GitHub · GitHub
url: https://github.com/topics

Each session has its own navigation state but shares the login cookies. So you log in once, and all tasks can work in parallel.

The three concurrency models:

Model	What's shared	Use case
Cross-browser parallel	Nothing (independent identity)	Multi-account monitoring
Same-browser multi-session	Login state	Parallel tasks, one account
Privacy mode	Nothing (fresh each time)	One-off scraping, anonymity

When done:

browser-act session close check-trending
browser-act session close check-topics

session_name=check-trending closed=true
session_name=check-topics closed=true

Sessions auto-reclaim after 8 hours if you forget. But clean up after yourself.

Step 7: Skill Forge Make It Reusable

Let's say Step 6 worked great. I want to run that "check my GitHub morning routine" every day without the agent figuring it out from scratch each time.

First, install Skill Forge:

npx skills add browser-act/skills --skill browser-act-skill-forge --yes

Output:

◇  Installation complete

│  ✓ ~/.agents/skills/browser-act-skill-forge
│    universal: Amp, Antigravity, Antigravity CLI, Cline, Codex +12 more

Then tell your agent:

"Forge a Skill that checks GitHub notifications, extracts unread count and top 5 notification titles."

Skill Forge will:

Explore the page structure
Discover the best extraction path
Generate a reusable Skill file
Test it

Next time, the agent just runs the Skill. No re-exploration, no token waste. Same stable path every time.

Where I'd use this:

Daily dashboard checks
Competitor price monitoring
Pull request summary generation
CI/CD status aggregation

Where I wouldn't:

Sites that change layout constantly
One-off tasks that'll never repeat
Anything that needs real-time interaction (chat, live support)

Bonus: How I'd Use This in Production

I ran a few extra tests to see how BrowserAct fits into a real cloud operations workflow. Here's what I found.

Monitoring AWS Health Dashboard

browser-act --session aws-health browser open 99703194156616493 https://status.aws.amazon.com

session_name=aws-health
browser_type=stealth
url=https://health.aws.amazon.com/health/status
title=health.aws.amazon.com/health/status

Then extract the status:

browser-act --session aws-health get markdown

AWS Health Dashboard
====================
Service health - Jun 09, 2026

Take a screenshot for your Slack channel:

browser-act --session aws-health screenshot /tmp/aws-health.png

saved: /tmp/aws-health.png

Real screenshot, 315KB, exactly what the dashboard looks like. Ship that to a Slack webhook and your team gets a visual status check every morning.

Checking GitHub Status (CI Monitoring)

browser-act --session aws-health navigate https://www.githubstatus.com
browser-act --session aws-health get markdown

All Systems Operational
-----------------------
Git Operations   99.85% uptime  Normal
Webhooks         99.96% uptime  Normal
API Requests     99.99% uptime  Normal
Issues           99.97% uptime  Normal
Pull Requests    99.61% uptime  Normal

Now my agent has real uptime data it can act on. If Pull Requests drops below 99%, alert me.

Browsing Cloudflare's Own Product Page (Their Anti-Bot)

The ultimate test can BrowserAct get through Cloudflare's own website?

browser-act --session aws-health navigate https://www.cloudflare.com/products/
browser-act --session aws-health get markdown

Cloudflare Products
-------------------
Everything you need to build, deploy, and scale applications...
Workers Global serverless functions
D1 - Serverless SQL database
R2 - Object storage...

Yes. It extracted their full product catalog through their own protection. That's something.

JavaScript Evaluation (Custom Data Extraction)

Need something specific that the markdown extraction doesn't give you? Run JavaScript directly:

browser-act --session aws-health eval "document.title"

Products | Cloudflare

browser-act --session aws-health eval "document.querySelectorAll('a').length"

This lets you write precise extraction logic without relying on the markdown parser.

My Production Architecture (What I'd Actually Build)

Here's the setup I'm planning for my AWS monitoring workflow:

┌────────────────────────────────────────────────────────────────┐
│                    Cron Job (Every Morning 8 AM)                │
└──────────────────────────┬─────────────────────────────────────┘
                           │
                           ▼
┌────────────────────────────────────────────────────────────────┐
│                   AI Agent (Kiro / Claude Code)                 │
│                                                                │
│  1. browser-act --session grafana browser open <id> <url>      │
│  2. browser-act --session grafana get markdown                 │
│  3. browser-act --session grafana screenshot ./grafana.png     │
│  4. browser-act --session cloudwatch browser open <id> <url>   │
│  5. browser-act --session cloudwatch get markdown              │
│  6. Parse data → Generate summary                             │
│  7. Post to Slack with screenshots                            │
│  8. Close all sessions                                         │
└──────────────────────────┬─────────────────────────────────────┘
                           │
                           ▼
┌────────────────────────────────────────────────────────────────┐
│                 Slack Channel: #morning-status                  │
│                                                                │
│  Morning Infra Report - Jun 04, 2026                           │
│  All AWS services operational                                  │
│  GitHub: 99.85% uptime                                         │
│  Grafana: CPU alert on prod-api-3                              │
│  [dashboard-screenshot.png]                                    │
└────────────────────────────────────────────────────────────────┘

Why this works better than API-only monitoring:

Some dashboards don't expose APIs (internal Grafana, vendor portals)
Screenshots give visual context that JSON data can't
Human handoff means SSO re-auth doesn't break the whole pipeline
One browser session stays logged in no re-auth every day

Production considerations:

Run on a dedicated EC2 instance (t3.medium is enough tested)
Use static proxy for stable identity (avoids triggering "new device" alerts)
Set --desc on browsers with site/account info so future sessions know what's what
Monitor credit usage each session costs credits, budget accordingly
Store screenshots in S3, link in Slack messages
If login expires, remote-assist lets you re-auth from your phone without SSH-ing in

Before vs After: How This Changed My Morning

Let me be real about what my daily routine looked like before and after testing BrowserAct.

Before BrowserAct (My Old Morning)

8:00 AM - Open laptop
8:02 AM - Open GitHub, hit MFA prompt on phone, wait
8:04 AM - Check notifications (12 unread), scan PRs (3 need review)
8:08 AM - Open AWS Console, MFA again, navigate to CloudWatch
8:12 AM - Check 3 dashboards across 2 accounts, screenshot for team
8:18 AM - Open Grafana, login, scroll through panels looking for alerts
8:23 AM - Check GitHub Actions 2 repos have failing builds
8:27 AM - Open competitor's changelog page (Cloudflare protected)
8:30 AM - Copy-paste a summary into Slack for the team
8:35 AM - Realize I missed one account, go back to AWS...
8:40 AM - Actually start working

40 minutes of monkey work. Every morning. On good days.

On bad days a session expires mid-check, or MFA doesn't arrive, or Cloudflare blocks my automation script. Then it's 50+ minutes.

After BrowserAct (What I'm Building Now)

8:00 AM - Agent runs automatically (cron trigger)

Agent:
  → Opens stealth session on GitHub
  → Extracts: 12 notifications, 3 PRs pending review
  → Navigates to GitHub Actions: 2 failing builds (api-server, docs-deploy)
  → Opens parallel session: AWS Health Dashboard
  → Extracts: "All Systems Operational"
  → Screenshots CloudWatch dashboard → saves to S3
  → Navigates to competitor changelog: extracts new features list
  → Posts summary to Slack #morning-status

8:01 AM - Slack notification pops up with the full morning report.

8:02 AM - I read the summary with my coffee. Actually start working.

Time saved: ~38 minutes/day. That's 3+ hours/week.

And the thing is it's not just the time. It's the mental context switching. Opening six different login-protected dashboards pulls you out of focus. Having a one-page summary waiting in Slack means I start my day knowing exactly what needs attention.

The Honest Part

It's not fully there yet. Here's my current reality:

What works today: The extraction, screenshots, and parallel sessions all work as shown in this article. I tested every command above on a real EC2 instance.
What I'm still setting up: The cron automation + Slack integration pipeline. That's a week of wiring. I'll write a follow-up when it's running.
The one catch: First login to each site needs me to remote-assist from my phone (handle MFA manually). After that, the session stays authenticated for days. So it's "almost" fully automated I handle MFA once a week, the agent handles the other 4 days.

Where I Wouldn't Use This

Being honest not everything needs browser automation:

AWS resource monitoring - Use CloudWatch alarms + SNS. APIs exist. Don't browser-scrape what you can API-call.
Simple uptime checks - Use UptimeRobot or similar. Don't overkill it.
Data that changes every second - BrowserAct isn't real-time. It's periodic checks.
Banking or highly sensitive portals - Too risky. Keep that manual.

Where It Shines

Dashboards without APIs - Grafana (free tier), internal tools, vendor portals
Multi-account visual checks - AWS Console across 3 accounts, screenshot each
Competitor monitoring - Product pages behind Cloudflare, pricing pages that block scrapers
CI/CD status aggregation - Pull data from GitHub Actions, CircleCI, Jenkins (web UI) into one summary
Any "check and report" workflow that you do manually more than twice a week

What Worked and What Didn't

Worked Well

Stealth browser sessions - get through Cloudflare where Puppeteer/Selenium can't. Tested on nowsecure.nl and cloudflare.com itself both passed.
Real dashboard extraction - pulled structured data from AWS Health Dashboard, GitHub Status, and Cloudflare product pages. Real production use cases, all worked.
Screenshots - screenshot command saves PNGs directly. 315KB for a full-page capture. Great for visual monitoring and Slack reports.
Indexed interaction - the state/click/input model is clean. Way better than parsing DOM. Real element indices, real clicks.
Human handoff - generates a live URL instantly. This solves a real problem I've had for years.
Session isolation - ran two parallel sessions on the same browser, completely independent. No conflicts.
Navigation within sessions - navigate command lets you move across sites within one session. Open AWS Health, then navigate to GitHub Status, same session.
JavaScript eval - eval lets you run custom extraction when markdown isn't precise enough.
Network capture - network requests shows exactly what's happening under the hood. Great for debugging.
Install was clean - uv tool install pulled 90 packages, had it running in under a minute on a t3.medium.

Mixed

stealth-extract vs full sessions - stealth-extract works for quick reads on normal sites, but for heavily protected sites you need a full stealth browser session. Not obvious from the docs.
CAPTCHA solving - solve-captcha returned "no compatible captcha found" when I tested (page had already loaded past it). Couldn't trigger a scenario where it was needed during my testing.
Speed - stealth browsing is slower than raw Puppeteer. Takes a few seconds to open sessions. Makes sense (it's doing more work) but worth noting.

Could Be Better

Documentation - the get-skills command dumps a massive guide. Useful but overwhelming on first read.
Error messages - "Browser launch failed: Connection closed" doesn't tell you much. Took trial and error to figure out I needed a full session for protected sites.
Requires API key for stealth - can't use the anti-detection features without signing up first. Fair, but the free tier is limited.

Pricing & Credits

BrowserAct runs on a credit system:

Free credits: 100 credits on signup.
Paid plans: Check browseract.com/pricing for current rates

For testing and writing this article, the free credits were enough. For production monitoring workflows running daily, you'd need a paid plan.

Troubleshooting

Issue 1: "browser-act: command not found"

bash: browser-act: command not found

Fix: Install via uv (not npm):

curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install browser-act-cli --python 3.12

The binary installs to ~/.local/bin/ make sure it's in your PATH.

Issue 2: "Browser launch failed: Connection closed"

Error: Browser launch failed: Browser.close: Connection closed while reading from the driver

Fix: This happens when stealth-extract can't handle a heavily protected site. Use a full browser session instead:

browser-act browser create --name "my-stealth" --type stealth --desc "browsing"
browser-act --session task1 browser open <browser-id> <url>

Issue 3: "api_key: not configured"

CLI:
  api_key: not configured

Fix: You need an API key for stealth browser features:

browser-act auth set <your-api-key>

Get one at browseract.com.

Issue 4: Chrome not found

Error: Chrome executable not found

Fix: Install Chrome:

# Ubuntu
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

# Amazon Linux
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm

Issue 5: Session already exists

Error: Session name 'my-task' already in use

Fix: Close it first:

browser-act session close my-task

Real-World Use Cases

As a cloud architect, here's where I see this fitting:

1. Infrastructure Monitoring

Check multiple dashboards (Grafana, CloudWatch, Datadog) through a single agent
Generate daily summaries from web-based monitoring tools
Alert on visual changes in dashboards that don't have API access

2. DevOps Workflows

Automate PR review summaries across repos
Check CI/CD status across multiple platforms
Monitor deployment pipelines with login-protected UIs

3. Multi-Account Operations

Manage multiple AWS accounts through Console (when CLI isn't enough)
Monitor multiple SaaS dashboards
Cross-account compliance checks on web portals

4. Research & Data Collection

Track competitor features and pricing pages
Aggregate release notes from multiple vendor sites
Collect public data from protected listing pages

Key Concepts Learned

1. Sessions = Task Workspaces

browser-act --session <name> <command>

Every task gets its own session. Sessions don't interfere. Name them descriptively.

2. Browsers = Identities

Different browsers = different fingerprints, proxies, cookies. Use separate browsers for separate accounts. Three types:

chrome - imports your local Chrome login state
chrome-direct - controls your running Chrome directly
stealth - anti-detection browser with fingerprint masking (needs API key)

3. Skills = Reusable Workflows

Once something works, package it as a Skill. Next time it runs without re-exploration.

4. Three-Layer Anti-Blocking

Environment (fingerprint) → Execution (auto-solve) → Human (handoff). Progressive escalation.

5. Two Extraction Modes

stealth-extract - quick, zero-config, good for simple reads. Lightweight.
Full browser session - persistent, stronger anti-detection, needed for heavily protected sites.

What's Next?

I'm building the full pipeline I described above. Here's my roadmap:

Week 1 (done): Test BrowserAct on real sites this article
Week 2: Wire up the morning monitoring agent (cron + BrowserAct + Slack webhook)
Week 3: Add Skill Forge package the workflow so it's stable across page layout changes
Week 4: Run for 30 days, track: time saved, credits consumed, failures, manual interventions

If there's interest, I'll write a follow-up with:

Actual production usage data (cost per month, reliability %)
The Skill file I built for GitHub monitoring
How many times remote-assist saved me from a broken pipeline
Token usage comparison vs doing the same thing with raw Playwright

The goal is simple: I want my mornings back. 40 minutes of tab-switching replaced by a 2-minute Slack read. BrowserAct is the missing piece between "my agent can code" and "my agent can actually see what's happening in production."

Resources

BrowserAct:

My Previous Article:

I Let an AI Agent Become My DevOps Engineer

Summary

Here's what I found after testing BrowserAct on an EC2 instance (Amazon Linux, t3.medium):

Stealth browser sessions get through Cloudflare tested on nowsecure.nl and cloudflare.com itself
Extracted real data from AWS Health Dashboard, GitHub Status actual production monitoring working
Screenshots save as PNG ready to pipe to Slack/S3 for visual dashboards
Human handoff generates a live remote URL in seconds actually works
Parallel sessions run independently on the same browser no conflicts
Navigation + eval let you build complex multi-step extraction workflows
Indexed interaction (state, click, input) is agent-friendly and token-efficient
Skill Forge makes repeated workflows reusable
stealth-extract has limits use full sessions for protected sites
Needs API key for anti-detection features
Error messages could be clearer when things fail

Bottom line: If you're running AI agents and they need to interact with real websites not just APIs this is worth testing. I'm already planning a daily monitoring pipeline with it. It's not magic. It won't bypass everything. But it solves real problems that I haven't seen other tools handle this cleanly.

Connect & Share

If this was useful:

Star the BrowserAct GitHub repo
Drop a comment what workflows would you automate?
Share with your team
Follow me for the follow-up article

📌 Wrapping Up

Thanks for reading! If this was helpful:

❤️ Like if it added value
💾 Save for later
🔄 Share with your team

Follow me for more on: AWS architecture, FinOps, DevOps, and AI Infrastructure.

👉 Visit my website | Connect on LinkedIn | Email: simplynadaf@gmail.com

Happy Learning 🚀

Top comments (10)

Salmankhan • Jun 15

Sarvar agent ai is getting more advanced now, what if I don't know about node js or angular js how am I supposed to cope up with this mahn?

Sarvar Nadaf AWS Community Builders • Jun 16

Thanks! You don't need to master every technology. Focus on fundamentals and learn how to work with AI tools effectively. AI can even help you learn Node.js, Angular, or any new technology as you go. The real skill is adapting and continuously learning.

Salmankhan • Jun 16

Thanks, I'll be consistent. Appreciate it.

Sarvar Nadaf AWS Community Builders • Jun 16

Yup! Your Welcome.

Mustkhim Inamdar • Jun 9

Just wanna clarification, that after using BrowserAct, do you think browser automation is becoming the new API for AI agents?

Sarvar Nadaf AWS Community Builders • Jun 9

In many cases, yes. Browser access allows agents to interact with applications that don't expose APIs. However, APIs are still faster, cheaper, and more reliable when available.

Mustkhim Inamdar • Jun 9

Thanks it make sense buddy ✌🏻

Sarvar Nadaf AWS Community Builders • Jun 9

Yep Your welcome.

Pratik Ponde • Jun 9

Well written Sarvar but What skill do you think engineers should focus on as AI agents become more capable?

Sarvar Nadaf AWS Community Builders • Jun 9

Thank Pratik Engineer should focus on System design, architecture, security, and critical thinking. As execution becomes more automated, understanding how systems work together becomes even more valuable.