DEV Community

Cover image for My AI Agent Hit a Login Wall: BrowserAct Let It Ask for Help and Resume

My AI Agent Hit a Login Wall: BrowserAct Let It Ask for Help and Resume

๐Ÿ‘‹ Hey there, Tech Enthusiasts!

I'm Sarvar, a Cloud Architect who loves turning complex tech problems into simple solutions. I've worked with AWS, Azure, DevOps, Data, Analytics, Generative-AI and Agentic-AI building real systems for real companies. In this article series, I'll share what I've learned in a way that's easy to follow, whether you're experienced or just getting started.

Let's get into it! ๐Ÿš€


I'm a cloud architect. I manage infrastructure across multiple AWS accounts, run CI/CD pipelines, and keep monitoring dashboards healthy for clients. A lot of my day involves checking web-based tools Grafana, GitHub, vendor portals, internal dashboards most of which sit behind login walls and anti-bot protection.

But there was always a gap: the agent couldn't browse the web. It couldn't check a dashboard, read a protected page, or handle a login flow.

That changed when I integrated BrowserAct into my workflow. It's a browser layer that gives AI agents the ability to browse real websites with anti-detection, session management, and human handoff built in.

If you missed the first article where I covered the full setup, start there: I Gave My AI Agent a Real Browser - Here's What Actually Happened. This article focuses on the headless + human handoff pattern I've been running in production.


A Note on Tooling

I'm using Kiro as my AI agent it's free during preview and can execute CLI commands directly. But BrowserAct works with anything that can run shell commands: Claude Code, Cursor, Codex, CrewAI, LangChain, or even a simple bash script. The pattern is the same regardless of agent.


The Setup

I run BrowserAct on a Linux server no desktop, no display, just a terminal. This is how it runs in production for my client: headless on a server, triggered by cron or the agent.

Prerequisites

Before getting started, make sure the following components are installed on your system.

Verify Installed Versions

Run the following commands:

python3 --version
# Python 3.12+

node --version
# v18+

google-chrome --version
# Google Chrome 149.x.x.x
Enter fullscreen mode Exit fullscreen mode

Install UV (If Not Already Installed)

BrowserAct uses Python tooling, and uv is the recommended package manager.

curl -LsSf https://astral.sh/uv/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Install Google Chrome (If Not Already Installed)

Ubuntu / Debian

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb
Enter fullscreen mode Exit fullscreen mode

Amazon Linux

wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm
Enter fullscreen mode Exit fullscreen mode

Creating a BrowserAct API Key

To allow your AI agent to control a real browser, you'll need a BrowserAct API key.

Step 1: Sign In

Log in to your BrowserAct account.

Step 2: Open API Key Management

  1. Click your profile email address in the top-right corner.
  2. Select API Keys from the dropdown menu.
  3. Click Manage Keys.

Step 3: Create a New API Key

  1. Click Create Key.
  2. Enter a descriptive name such as:
  • Amazon-Q
  • MCP-Server
  • Development
    1. Click Create.

Step 4: Save the API Key

Copy the generated API key and store it securely. For security reasons, you may not be able to view the complete key again after leaving the page.

Treat your API key like a password. Never share it publicly or commit it to source code repositories.

Configure BrowserAct Authentication

Once you have your API key, authenticate BrowserAct using the following command:

browser-act auth set <your-api-key>
Enter fullscreen mode Exit fullscreen mode

Successful authentication will return:

API key saved.
Enter fullscreen mode Exit fullscreen mode

At this point, BrowserAct is connected and ready to provide browser access to your AI agent. The integration takes less than a minute and requires no additional configuration.

After that, the agent has a browser. One more step create a stealth browser instance:

browser-act browser create --type stealth --name "research"
Enter fullscreen mode Exit fullscreen mode


id=101758963005571124 name="research" type=stealth
Enter fullscreen mode Exit fullscreen mode

That id is your browser ID you'll use it every time you open a session. Think of it like a browser profile: it keeps its own fingerprint, cookies, and anti-detection settings. You create it once and reuse it across sessions.

Note: The browser ID shown in this article (101758963005571124) is from my account. When you run browser create, you'll get your own unique ID. Use that in place of mine throughout the examples.


Managing Sessions

Before starting new sessions, check if any are already running:

browser-act session list
Enter fullscreen mode Exit fullscreen mode


session_name: research-gh
browser_type: stealth
browser_id: 101758963005571124
title: Trending repositories on GitHub today ยท GitHub
url: https://github.com/trending

session_name: research-hn
browser_type: stealth
browser_id: 101758963005571124
title: news.ycombinator.com
url: https://news.ycombinator.com/

session_name: research-ph
browser_type: stealth
browser_id: 101758963005571124
title: Product Hunt โ€“ The best new products in tech.
url: https://www.producthunt.com/
Enter fullscreen mode Exit fullscreen mode

To close a specific session:

browser-act --session research-hn session close
Enter fullscreen mode Exit fullscreen mode
session_name=research-hn closed=true
Enter fullscreen mode Exit fullscreen mode

Tip: Always close sessions when you're done. Open sessions keep the browser running and consume resources. If you hit a "session already in use" error, it means that session name is still active either close it or use a different name.


Real Scenario: Morning Tech Research

One of the things I do for a client is compile a daily tech digest what's trending, what's launching, what competitors are shipping. Used to take me 30 minutes of tab-switching every morning.

Now my agent does it. Here's what that looks like.

Quick Extract One Session, One Page

# Open a stealth browser session on the target page
browser-act --session research-hn browser open 101758963005571124 https://news.ycombinator.com
Enter fullscreen mode Exit fullscreen mode


# Get the page state
browser-act --session research-hn state
Enter fullscreen mode Exit fullscreen mode

The agent got back clean, structured content page title, URL, and all interactive elements. From there it can extract exactly what it needs using JS eval:

browser-act --session research-hn eval 'JSON.stringify(Array.from(document.querySelectorAll(".athing")).slice(0,3).map(el => ({title: el.querySelector(".titleline a")?.textContent, points: el.nextElementSibling?.querySelector(".score")?.textContent})))'
Enter fullscreen mode Exit fullscreen mode
[
  {"title": "AI agent bankrupted their operator while trying to scan DN42", "points": "171 points"},
  {"title": "Nobody ever gets credit for fixing problems that never happened", "points": "348 points"},
  {"title": "If you are asking for human attention, demonstrate human effort", "points": "537 points"},
  {"title": "Show HN: Homebrew 6.0.0", "points": "1145 points"}
]
Enter fullscreen mode Exit fullscreen mode

Two commands to open, one to extract. The agent can summarize this, filter by topic, or flag anything relevant to the client.

Where this fits: Any team that needs a daily briefing tech trends, industry news, competitor launches. The agent grabs it, the team reads a summary instead of spending 30 minutes browsing.


Parallel Research - Three Sites at Once

For the full morning digest, the agent opens three parallel sessions on the same browser.

You can use the browser you already created, or create a separate one to keep research isolated from other workflows:

browser-act browser create
# Returns: id=101764340218654773
Enter fullscreen mode Exit fullscreen mode

Then open sessions on it:

# Session 1: GitHub Trending
browser-act --session research-gh browser open 101764340218654773 https://github.com/trending
Enter fullscreen mode Exit fullscreen mode


# Session 2: Hacker News
browser-act --session research-hn browser open 101764340218654773 https://news.ycombinator.com
Enter fullscreen mode Exit fullscreen mode


# Session 3: Product Hunt
browser-act --session research-ph browser open 101764340218654773 https://www.producthunt.com
Enter fullscreen mode Exit fullscreen mode

All three run independently. No conflicts. The agent works through each one:

browser-act session list
Enter fullscreen mode Exit fullscreen mode
session_name: research-gh
browser_type: stealth
browser_id: 101764340218654773
title: Trending repositories on GitHub today ยท GitHub
url: https://github.com/trending

session_name: research-hn
browser_type: stealth
browser_id: 101764340218654773
title: news.ycombinator.com
url: https://news.ycombinator.com/

session_name: research-ph
browser_type: stealth
browser_id: 101764340218654773
title: Product Hunt โ€“ The best new products in tech.
url: https://www.producthunt.com/
Enter fullscreen mode Exit fullscreen mode

Where this fits: Product teams that need multi-source intelligence before standup. Marketing teams tracking launches. DevOps engineers checking status pages across providers. Anything where you'd normally open 5+ tabs.


Structured Data Extraction

Instead of parsing full page HTML, the agent runs targeted JavaScript and gets clean JSON:

browser-act --session research-gh eval "JSON.stringify(Array.from(document.querySelectorAll('article.Box-row')).slice(0,3).map(r => ({repo: r.querySelector('h2 a')?.textContent.trim(), stars: r.querySelector('span.d-inline-block.float-sm-right')?.textContent.trim()})))"
Enter fullscreen mode Exit fullscreen mode
[
  {"repo":"iptv-org /\n\n      iptv","stars":"2,650 stars today"},
  {"repo":"teslamate-org /\n\n      teslamate","stars":"35 stars today"},
  {"repo":"Panniantong /\n\n      Agent-Reach","stars":"1,045 stars today"}]

Enter fullscreen mode Exit fullscreen mode

The agent navigated within the same session Python trending, then TypeScript without opening a new browser. Took a screenshot for the report. I covered extraction patterns in depth in previous article.

Where this fits: Competitor monitoring prices, features, reviews. The agent extracts exactly the data points you need as structured JSON. No scraping framework. No maintenance when the page layout changes. BrowserAct isn't a standalone scraping tool it's a browser layer. Your AI agent is the brain that decides what to do. BrowserAct is the eyes and hands that execute on the web.


Then the Agent Hits a Wall

Everything was going smoothly. The agent had data from three sources, screenshots saved, research compiling nicely. Then it tried to check my GitHub profile settings:

browser-act --session research-gh navigate https://github.com/settings/profile
Enter fullscreen mode Exit fullscreen mode

Response:

url=https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fsettings%2Fprofile
title=Sign in to GitHub ยท GitHub
Enter fullscreen mode Exit fullscreen mode

Redirected to login. The agent checked the page state:

[2]<label />
    Username or email address
[3]<input type=text name=login />
[5]<input type=password />
[7]<input type=submit value=Sign in />
[8]<button />
    Continue with Google
Enter fullscreen mode Exit fullscreen mode

With any other automation setup, this is where the workflow dies. Script crashes. Logs an error. Someone restarts it manually tomorrow.

Here's what my agent did instead.


The Agent Asks for Help

browser-act --session research-gh remote-assist --objective "Sign in to GitHub to access profile settings"
Enter fullscreen mode Exit fullscreen mode
Remote assist session created.

Share this URL with the user:
  https://www.browseract.com/remote-cli/1c08b0f3e0cb46168c9dd836ead748d2
expires in 1h 0m

Human assist is now active - the browser is under user control.
Do not send browser commands until the user finishes the assist session.
Enter fullscreen mode Exit fullscreen mode

The agent recognized it couldn't solve this. It generated a live URL and asked me for help.

Remote assist session created with shareable URL


Step 1: Open the Remote Assist URL

I opened that URL on my phone. I saw the actual browser the GitHub login page, exactly as the agent left it.

After opening the link in your browser, you'll see the remote session interface. Click "Take Control" to interact with the browser directly. The session remains active for up to 1 hour.

Remote assist interface showing Take Control button with 1 hour timer


Step 2: Complete the Login

Once you click Take Control, you'll see the GitHub login UI. Enter your credentials and complete the OTP/2FA to sign in.

GitHub login page rendered inside the remote assist browser


Step 3: Confirm Access

Once you log in successfully, you'll see the GitHub profile page confirming the session is now authenticated.

GitHub profile page loaded after successful login


Step 4: Hand Control Back to the Agent

Click "Complete" in the top-right corner to end the human assist session and return control to the agent.

Complete button in top-right corner of remote assist interface

Once you click Done, the step is completed. The agent is now rerouted back to the BrowserAct terminal to continue its work.

Session completed confirmation screen


The Agent Resumes Same Session, No Restart

After the human signs in and closes the remote-assist session, the agent picks up exactly where it left off:

# Verify: agent checks where it is now
browser-act --session research-gh state
Enter fullscreen mode Exit fullscreen mode
url=https://github.com/settings/profile
title=Your profile

[14]<a class=color-fg-default />
    Sarvar's (simplynadaf)
[15]<a class=btn btn-sm />
    Go to your personal profile
[16]<a />  Public profile
[17]<a />  Account
[18]<a />  Appearance
Enter fullscreen mode Exit fullscreen mode

No login prompt. The agent now has full access to the authenticated GitHub session.

Agent state showing authenticated GitHub profile page


Proof: Navigating Authenticated Content

The agent can now access any authenticated resource without interruption:

browser-act --session research-gh navigate https://github.com/simplynadaf/devsecops-pipeline-demo
Enter fullscreen mode Exit fullscreen mode
url=https://github.com/simplynadaf/devsecops-pipeline-demo
title=simplynadaf/devsecops-pipeline-demo: DevSecOps Pipeline Demo with Security Scanning
new_tab=False
Enter fullscreen mode Exit fullscreen mode

Agent successfully accessing private repo after authentication

It shows the repo title instead of redirecting to /login proving the authenticated session is active and persisted through the handoff.

The agent continued from where it left off. Same session. Same browser state. No restart. No lost context.


The Handoff Flow

Input BrowserAct Action Output
navigate github.com/settings/profile Detects login redirect Login wall identified
remote-assist --objective "Sign in to GitHub" Generates live URL, pauses agent URL sent to human via Slack
Human signs in + 2FA on phone Session state preserved Agent resumes with authenticated session
state Reads authenticated page Profile settings data extracted

Morning Report Output

After the agent completes all checks (including the ones that needed human login), it posts this to #team-status:

DAILY INFRASTRUCTURE REPORT - Mon Jun 15, 2026 06:04 UTC

Grafana (prod):     All dashboards green. No alerts in 24h.
GitHub (org):       3 PRs merged overnight. 1 pending review.
AWS Health:         No scheduled maintenance. All regions healthy.
Vendor Portal:     SSL cert expires in 12 days. Ticket created.
Uptime Monitor:     99.97% across all endpoints (7-day avg).

Auth events:       1 remote-assist triggered (GitHub session expired).
                   Resolved in 38 seconds by on-call.

Next run: Tomorrow 06:00 UTC
Enter fullscreen mode Exit fullscreen mode

Why This Is a Design Pattern

Most automation falls into two camps:

  • Fully automated (breaks when anything unexpected happens)
  • Fully manual (defeats the purpose of automation)

The human handoff is a third option: the agent does 95% of the work autonomously. When it hits the 5% that requires a human a login, a 2FA prompt, a CAPTCHA it can't solve it pauses, asks for help, and resumes.

I've been building automation for years. Every time I tried to make something "fully automated" that involved login-protected tools, it would break within a week. Session expired. MFA rotated. Cookie invalidated.

The answer was always "just add a human step" but there was never a clean way to do that without killing the whole automation. This is the clean way.


It Runs Headless - That's What Makes It Production Ready

This entire test ran on a Linux server with no display:

echo $DISPLAY
# (empty - no GUI)
Enter fullscreen mode Exit fullscreen mode

No screen. No desktop. The agent and BrowserAct run completely headless. But when remote-assist triggers, it gives the human a visual interface to that headless browser through a URL.

You see the browser as if it were on your desktop even though it's running on a server with no monitor attached.

This means:

  • Your agent runs on any server, any cloud, any CI pipeline
  • No VNC, no desktop environment, no display needed
  • When human help is needed, the URL works from any device - phone, laptop, tablet
  • After the human is done, the headless agent continues

Where this fits: DevOps and SRE teams running agents on headless servers or in containers. When the agent needs a human, tap the link from your phone on a train, in a meeting, or at 2 AM.


How This Runs in Production

Here's the actual workflow I built for my client's infrastructure monitoring:

6:00 AM - Cron triggers the agent

Agent (headless, on Linux Server):
  โ†’ Opens parallel sessions on 5 dashboards
  โ†’ Extracts status data, takes screenshots
  โ†’ Hits a login wall on one dashboard (session expired overnight)
  โ†’ Sends remote assist URL to Slack

6:01 AM - Slack notification on the on call engineer's phone

Engineer (half awake):
  โ†’ Taps the URL
  โ†’ Sees the login page
  โ†’ Signs in, taps MFA approve
  โ†’ Closes

6:02 AM - Agent resumes, finishes remaining checks, posts morning report to #team-status
Enter fullscreen mode Exit fullscreen mode

Authentication happens maybe once or twice a week. The agent handles everything else every day. That's the ratio 95% automated, 5% human, zero broken pipelines.


The Honest Review

What worked:

  • Human handoff works exactly as described. URL generates instantly, state persists after.
  • The agent-to-browser integration is clean. Commands are simple, outputs are agent-friendly.
  • Anti-detection gets through Cloudflare without the agent doing anything special.
  • Headless mode on a server with no display works perfectly with remote assist.
  • Parallel sessions are stable and independent.
  • JS eval gives the agent precision extraction without any scraping libraries.

What could be better:

  • Documentation is dense. The skill reference is thorough but overwhelming the first time.
  • Error messages aren't always helpful. "Connection closed" doesn't tell you much.
  • Speed is slower than raw Puppeteer. The anti-detection adds a few seconds per session.
  • You need an API key for the stealth features.

FAQ

Can AI agents handle login walls?

Not on their own. When an agent hits a login page, it can't type your password or tap your MFA prompt. It just gets stuck. BrowserAct solves this with remote assist the agent pauses, sends you a link, you handle the login, and the agent picks up where it left off.

What is BrowserAct remote assist?

It's a feature that lets your agent ask a human for help mid workflow. The agent generates a URL that opens the live browser in your phone or laptop. You do the human step (login, 2FA, CAPTCHA), close it, and the agent continues automatically. No restart, no lost state.

Does BrowserAct need an API key?

Yes, for the stealth browser features (anti-detection, fingerprint masking, proxy rotation). You can get one at browseract.com. There are free credits on signup to test with.

Can this run on a headless server?

Yes. That's how I run it on an Linux Server with no display, no desktop environment. The browser runs headless. When the agent needs a human, the remote-assist URL gives you a visual interface to that headless browser from any device.


Getting Started

If you want to try this with your own agent:

  1. Install: npx skills add browser-act/skills --skill browser-act --yes
  2. Install CLI: uv tool install browser-act-cli --python 3.12
  3. Get an API key from browseract.com
  4. Set it: browser-act auth set <your-key>
  5. Your agent can now browse.

Works with Kiro, Claude Code, Cursor, Codex, CrewAI, or any tool that can run shell commands. The agent doesn't need to be special it just needs to call the CLI.


The Bottom Line

The browser was always the gap in agent automation. Not because agents can't reason about web content they can. But because the web is built for humans, and the moment authentication enters the picture, pure automation dies. The human handoff pattern fixes this: the agent does 95% of the work, asks for help on the 5% it can't handle, and resumes without missing a beat. It's practical, it runs in production, and it replaced a workflow that used to break every other week.

If you're a DevOps engineer, SRE, or cloud architect running AI agents and your agents can't touch the web this is worth 30 minutes of your time to test.


Resources


๐Ÿ“Œ Wrapping Up

Thanks for reading! If this was helpful:

  • โค๏ธ Like if it added value
  • ๐Ÿ’พ Save for later
  • ๐Ÿ”„ Share with your team

Follow me for more on: AWS architecture, FinOps, DevOps, and AI Infrastructure.

๐Ÿ‘‰ Visit my website | Connect on LinkedIn | Email: simplynadaf@gmail.com

Happy Learning ๐Ÿš€

Top comments (9)

Collapse
 
steven_r_404 profile image
Steven Ray

Interesting approach. How does BrowserAct manage browser session persistence when control is handed over to a human and then returned to the agent? Also, what underlying browser infrastructure or application is used at the backend to maintain the session state without interruption?

Collapse
 
sarvar_04 profile image
Sarvar Nadaf AWS Community Builders

Great question. BrowserAct keeps the same browser session active, allowing the user and agent to seamlessly share context. I believe it uses a persistent remote browser environment, though I'd be interested to hear more details from the BrowserAct team about the underlying architecture.

Collapse
 
steven_r_404 profile image
Steven Ray

Insightful Thanks!

Thread Thread
 
sarvar_04 profile image
Sarvar Nadaf AWS Community Builders

Your welcome ๐Ÿ‘๐Ÿป

Collapse
 
salman_khan_c31307505285e profile image
Salmankhan

Very much detailed for beginners and insightful. Thank you Sarvar.

Collapse
 
sarvar_04 profile image
Sarvar Nadaf AWS Community Builders

Yup Your Welcome!

Collapse
 
sanket_patharkar profile image
Sanket Patharkar

Excellent post! The concept of combining AI-driven browser automation with human oversight is both practical and powerful. Thanks for breaking down the workflow in a way that's easy to understand. Looking forward to seeing how this space evolves.

Collapse
 
sarvar_04 profile image
Sarvar Nadaf AWS Community Builders

Thank you! I completely agree human oversight is a critical piece of making AI agents practical in real-world scenarios. The ability to seamlessly switch between autonomous execution and human intervention opens up many possibilities for enterprise automation. Glad you found the workflow useful!

Collapse
 
parth_hawanna_0585706601e profile image
Parth Hawanna

Very much detailed for beginners and insightful. Thank you Sarvar.