Sarvar Nadaf for AWS Community Builders

Posted on Jun 16

My AI Agent Hit a Login Wall: BrowserAct Let It Ask for Help and Resume

#ai #agents #automation #discuss

👋 Hey there, Tech Enthusiasts!

I'm Sarvar, a Cloud Architect who loves turning complex tech problems into simple solutions. I've worked with AWS, Azure, DevOps, Data, Analytics, Generative-AI and Agentic-AI building real systems for real companies. In this article series, I'll share what I've learned in a way that's easy to follow, whether you're experienced or just getting started.

Let's get into it! 🚀

I'm a cloud architect. I manage infrastructure across multiple AWS accounts, run CI/CD pipelines, and keep monitoring dashboards healthy for clients. A lot of my day involves checking web-based tools Grafana, GitHub, vendor portals, internal dashboards most of which sit behind login walls and anti-bot protection.

But there was always a gap: the agent couldn't browse the web. It couldn't check a dashboard, read a protected page, or handle a login flow.

That changed when I integrated BrowserAct into my workflow. It's a browser layer that gives AI agents the ability to browse real websites with anti-detection, session management, and human handoff built in.

If you missed the first article where I covered the full setup, start there: I Gave My AI Agent a Real Browser - Here's What Actually Happened. This article focuses on the headless + human handoff pattern I've been running in production.

A Note on Tooling

I'm using Kiro as my AI agent it's free during preview and can execute CLI commands directly. But BrowserAct works with anything that can run shell commands: Claude Code, Cursor, Codex, CrewAI, LangChain, or even a simple bash script. The pattern is the same regardless of agent.

The Setup

I run BrowserAct on a Linux server no desktop, no display, just a terminal. This is how it runs in production for my client: headless on a server, triggered by cron or the agent.

Prerequisites

Before getting started, make sure the following components are installed on your system.

Verify Installed Versions

Run the following commands:

python3 --version
# Python 3.12+

node --version
# v18+

google-chrome --version
# Google Chrome 149.x.x.x

Install UV (If Not Already Installed)

BrowserAct uses Python tooling, and uv is the recommended package manager.

curl -LsSf https://astral.sh/uv/install.sh | sh

Install Google Chrome (If Not Already Installed)

Ubuntu / Debian

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

Amazon Linux

wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo yum localinstall -y google-chrome-stable_current_x86_64.rpm

Creating a BrowserAct API Key

To allow your AI agent to control a real browser, you'll need a BrowserAct API key.

Step 1: Sign In

Step 2: Open API Key Management

Click your profile email address in the top-right corner.
Select API Keys from the dropdown menu.
Click Manage Keys.

Step 3: Create a New API Key

Click Create Key.
Enter a descriptive name such as:

Amazon-Q
MCP-Server
Development
1. Click Create.

Step 4: Save the API Key

Copy the generated API key and store it securely. For security reasons, you may not be able to view the complete key again after leaving the page.

Treat your API key like a password. Never share it publicly or commit it to source code repositories.

Configure BrowserAct Authentication

Once you have your API key, authenticate BrowserAct using the following command:

browser-act auth set <your-api-key>

Successful authentication will return:

API key saved.

At this point, BrowserAct is connected and ready to provide browser access to your AI agent. The integration takes less than a minute and requires no additional configuration.

After that, the agent has a browser. One more step create a stealth browser instance:

browser-act browser create --type stealth --name "research"

id=101758963005571124 name="research" type=stealth

That id is your browser ID you'll use it every time you open a session. Think of it like a browser profile: it keeps its own fingerprint, cookies, and anti-detection settings. You create it once and reuse it across sessions.

Note: The browser ID shown in this article (101758963005571124) is from my account. When you run browser create, you'll get your own unique ID. Use that in place of mine throughout the examples.

Managing Sessions

Before starting new sessions, check if any are already running:

browser-act session list

session_name: research-gh
browser_type: stealth
browser_id: 101758963005571124
title: Trending repositories on GitHub today · GitHub
url: https://github.com/trending

session_name: research-hn
browser_type: stealth
browser_id: 101758963005571124
title: news.ycombinator.com
url: https://news.ycombinator.com/

session_name: research-ph
browser_type: stealth
browser_id: 101758963005571124
title: Product Hunt – The best new products in tech.
url: https://www.producthunt.com/

To close a specific session:

browser-act --session research-hn session close

session_name=research-hn closed=true

Tip: Always close sessions when you're done. Open sessions keep the browser running and consume resources. If you hit a "session already in use" error, it means that session name is still active either close it or use a different name.

Real Scenario: Morning Tech Research

One of the things I do for a client is compile a daily tech digest what's trending, what's launching, what competitors are shipping. Used to take me 30 minutes of tab-switching every morning.

Now my agent does it. Here's what that looks like.

Quick Extract One Session, One Page

# Open a stealth browser session on the target page
browser-act --session research-hn browser open 101758963005571124 https://news.ycombinator.com

# Get the page state
browser-act --session research-hn state

The agent got back clean, structured content page title, URL, and all interactive elements. From there it can extract exactly what it needs using JS eval:

browser-act --session research-hn eval 'JSON.stringify(Array.from(document.querySelectorAll(".athing")).slice(0,3).map(el => ({title: el.querySelector(".titleline a")?.textContent, points: el.nextElementSibling?.querySelector(".score")?.textContent})))'

[
  {"title": "AI agent bankrupted their operator while trying to scan DN42", "points": "171 points"},
  {"title": "Nobody ever gets credit for fixing problems that never happened", "points": "348 points"},
  {"title": "If you are asking for human attention, demonstrate human effort", "points": "537 points"},
  {"title": "Show HN: Homebrew 6.0.0", "points": "1145 points"}
]

Two commands to open, one to extract. The agent can summarize this, filter by topic, or flag anything relevant to the client.

Where this fits: Any team that needs a daily briefing tech trends, industry news, competitor launches. The agent grabs it, the team reads a summary instead of spending 30 minutes browsing.

Parallel Research - Three Sites at Once

For the full morning digest, the agent opens three parallel sessions on the same browser.

You can use the browser you already created, or create a separate one to keep research isolated from other workflows:

browser-act browser create
# Returns: id=101764340218654773

Then open sessions on it:

# Session 1: GitHub Trending
browser-act --session research-gh browser open 101764340218654773 https://github.com/trending

# Session 2: Hacker News
browser-act --session research-hn browser open 101764340218654773 https://news.ycombinator.com

# Session 3: Product Hunt
browser-act --session research-ph browser open 101764340218654773 https://www.producthunt.com

All three run independently. No conflicts. The agent works through each one:

browser-act session list

session_name: research-gh
browser_type: stealth
browser_id: 101764340218654773
title: Trending repositories on GitHub today · GitHub
url: https://github.com/trending

session_name: research-hn
browser_type: stealth
browser_id: 101764340218654773
title: news.ycombinator.com
url: https://news.ycombinator.com/

session_name: research-ph
browser_type: stealth
browser_id: 101764340218654773
title: Product Hunt – The best new products in tech.
url: https://www.producthunt.com/

Where this fits: Product teams that need multi-source intelligence before standup. Marketing teams tracking launches. DevOps engineers checking status pages across providers. Anything where you'd normally open 5+ tabs.

Structured Data Extraction

Instead of parsing full page HTML, the agent runs targeted JavaScript and gets clean JSON:

browser-act --session research-gh eval "JSON.stringify(Array.from(document.querySelectorAll('article.Box-row')).slice(0,3).map(r => ({repo: r.querySelector('h2 a')?.textContent.trim(), stars: r.querySelector('span.d-inline-block.float-sm-right')?.textContent.trim()})))"

[
  {"repo":"iptv-org /\n\n      iptv","stars":"2,650 stars today"},
  {"repo":"teslamate-org /\n\n      teslamate","stars":"35 stars today"},
  {"repo":"Panniantong /\n\n      Agent-Reach","stars":"1,045 stars today"}]

The agent navigated within the same session Python trending, then TypeScript without opening a new browser. Took a screenshot for the report. I covered extraction patterns in depth in previous article.

Where this fits: Competitor monitoring prices, features, reviews. The agent extracts exactly the data points you need as structured JSON. No scraping framework. No maintenance when the page layout changes. BrowserAct isn't a standalone scraping tool it's a browser layer. Your AI agent is the brain that decides what to do. BrowserAct is the eyes and hands that execute on the web.

Then the Agent Hits a Wall

Everything was going smoothly. The agent had data from three sources, screenshots saved, research compiling nicely. Then it tried to check my GitHub profile settings:

browser-act --session research-gh navigate https://github.com/settings/profile

Response:

url=https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fsettings%2Fprofile
title=Sign in to GitHub · GitHub

Redirected to login. The agent checked the page state:

[2]<label />
    Username or email address
[3]<input type=text name=login />
[5]<input type=password />
[7]<input type=submit value=Sign in />
[8]<button />
    Continue with Google

With any other automation setup, this is where the workflow dies. Script crashes. Logs an error. Someone restarts it manually tomorrow.

Here's what my agent did instead.

The Agent Asks for Help

browser-act --session research-gh remote-assist --objective "Sign in to GitHub to access profile settings"

Remote assist session created.

Share this URL with the user:
  https://www.browseract.com/remote-cli/1c08b0f3e0cb46168c9dd836ead748d2
expires in 1h 0m

Human assist is now active - the browser is under user control.
Do not send browser commands until the user finishes the assist session.

The agent recognized it couldn't solve this. It generated a live URL and asked me for help.

Step 1: Open the Remote Assist URL

I opened that URL on my phone. I saw the actual browser the GitHub login page, exactly as the agent left it.

After opening the link in your browser, you'll see the remote session interface. Click "Take Control" to interact with the browser directly. The session remains active for up to 1 hour.

Step 2: Complete the Login

Once you click Take Control, you'll see the GitHub login UI. Enter your credentials and complete the OTP/2FA to sign in.

Step 3: Confirm Access

Once you log in successfully, you'll see the GitHub profile page confirming the session is now authenticated.

Step 4: Hand Control Back to the Agent

Click "Complete" in the top-right corner to end the human assist session and return control to the agent.

Once you click Done, the step is completed. The agent is now rerouted back to the BrowserAct terminal to continue its work.

The Agent Resumes Same Session, No Restart

After the human signs in and closes the remote-assist session, the agent picks up exactly where it left off:

# Verify: agent checks where it is now
browser-act --session research-gh state

url=https://github.com/settings/profile
title=Your profile

[14]<a class=color-fg-default />
    Sarvar's (simplynadaf)
[15]<a class=btn btn-sm />
    Go to your personal profile
[16]<a />  Public profile
[17]<a />  Account
[18]<a />  Appearance

No login prompt. The agent now has full access to the authenticated GitHub session.

Proof: Navigating Authenticated Content

The agent can now access any authenticated resource without interruption:

browser-act --session research-gh navigate https://github.com/simplynadaf/devsecops-pipeline-demo

url=https://github.com/simplynadaf/devsecops-pipeline-demo
title=simplynadaf/devsecops-pipeline-demo: DevSecOps Pipeline Demo with Security Scanning
new_tab=False

It shows the repo title instead of redirecting to /login proving the authenticated session is active and persisted through the handoff.

The agent continued from where it left off. Same session. Same browser state. No restart. No lost context.

The Handoff Flow

Input	BrowserAct Action	Output
`navigate github.com/settings/profile`	Detects login redirect	Login wall identified
`remote-assist --objective "Sign in to GitHub"`	Generates live URL, pauses agent	URL sent to human via Slack
Human signs in + 2FA on phone	Session state preserved	Agent resumes with authenticated session
`state`	Reads authenticated page	Profile settings data extracted

Morning Report Output

After the agent completes all checks (including the ones that needed human login), it posts this to #team-status:

DAILY INFRASTRUCTURE REPORT - Mon Jun 15, 2026 06:04 UTC

Grafana (prod):     All dashboards green. No alerts in 24h.
GitHub (org):       3 PRs merged overnight. 1 pending review.
AWS Health:         No scheduled maintenance. All regions healthy.
Vendor Portal:     SSL cert expires in 12 days. Ticket created.
Uptime Monitor:     99.97% across all endpoints (7-day avg).

Auth events:       1 remote-assist triggered (GitHub session expired).
                   Resolved in 38 seconds by on-call.

Next run: Tomorrow 06:00 UTC

Why This Is a Design Pattern

Most automation falls into two camps:

Fully automated (breaks when anything unexpected happens)
Fully manual (defeats the purpose of automation)

The human handoff is a third option: the agent does 95% of the work autonomously. When it hits the 5% that requires a human a login, a 2FA prompt, a CAPTCHA it can't solve it pauses, asks for help, and resumes.

I've been building automation for years. Every time I tried to make something "fully automated" that involved login-protected tools, it would break within a week. Session expired. MFA rotated. Cookie invalidated.

The answer was always "just add a human step" but there was never a clean way to do that without killing the whole automation. This is the clean way.

It Runs Headless - That's What Makes It Production Ready

This entire test ran on a Linux server with no display:

echo $DISPLAY
# (empty - no GUI)

No screen. No desktop. The agent and BrowserAct run completely headless. But when remote-assist triggers, it gives the human a visual interface to that headless browser through a URL.

You see the browser as if it were on your desktop even though it's running on a server with no monitor attached.

This means:

Your agent runs on any server, any cloud, any CI pipeline
No VNC, no desktop environment, no display needed
When human help is needed, the URL works from any device - phone, laptop, tablet
After the human is done, the headless agent continues

Where this fits: DevOps and SRE teams running agents on headless servers or in containers. When the agent needs a human, tap the link from your phone on a train, in a meeting, or at 2 AM.

How This Runs in Production

Here's the actual workflow I built for my client's infrastructure monitoring:

6:00 AM - Cron triggers the agent

Agent (headless, on Linux Server):
  → Opens parallel sessions on 5 dashboards
  → Extracts status data, takes screenshots
  → Hits a login wall on one dashboard (session expired overnight)
  → Sends remote assist URL to Slack

6:01 AM - Slack notification on the on call engineer's phone

Engineer (half awake):
  → Taps the URL
  → Sees the login page
  → Signs in, taps MFA approve
  → Closes

6:02 AM - Agent resumes, finishes remaining checks, posts morning report to #team-status

Authentication happens maybe once or twice a week. The agent handles everything else every day. That's the ratio 95% automated, 5% human, zero broken pipelines.

The Honest Review

What worked:

Human handoff works exactly as described. URL generates instantly, state persists after.
The agent-to-browser integration is clean. Commands are simple, outputs are agent-friendly.
Anti-detection gets through Cloudflare without the agent doing anything special.
Headless mode on a server with no display works perfectly with remote assist.
Parallel sessions are stable and independent.
JS eval gives the agent precision extraction without any scraping libraries.

What could be better:

Documentation is dense. The skill reference is thorough but overwhelming the first time.
Error messages aren't always helpful. "Connection closed" doesn't tell you much.
Speed is slower than raw Puppeteer. The anti-detection adds a few seconds per session.
You need an API key for the stealth features.

FAQ

Can AI agents handle login walls?

Not on their own. When an agent hits a login page, it can't type your password or tap your MFA prompt. It just gets stuck. BrowserAct solves this with remote assist the agent pauses, sends you a link, you handle the login, and the agent picks up where it left off.

What is BrowserAct remote assist?

It's a feature that lets your agent ask a human for help mid workflow. The agent generates a URL that opens the live browser in your phone or laptop. You do the human step (login, 2FA, CAPTCHA), close it, and the agent continues automatically. No restart, no lost state.

Does BrowserAct need an API key?

Yes, for the stealth browser features (anti-detection, fingerprint masking, proxy rotation). You can get one at browseract.com. There are free credits on signup to test with.

Can this run on a headless server?

Yes. That's how I run it on an Linux Server with no display, no desktop environment. The browser runs headless. When the agent needs a human, the remote-assist URL gives you a visual interface to that headless browser from any device.

Getting Started

If you want to try this with your own agent:

Install: npx skills add browser-act/skills --skill browser-act --yes
Install CLI: uv tool install browser-act-cli --python 3.12
Get an API key from browseract.com
Set it: browser-act auth set <your-key>
Your agent can now browse.

Works with Kiro, Claude Code, Cursor, Codex, CrewAI, or any tool that can run shell commands. The agent doesn't need to be special it just needs to call the CLI.

The Bottom Line

The browser was always the gap in agent automation. Not because agents can't reason about web content they can. But because the web is built for humans, and the moment authentication enters the picture, pure automation dies. The human handoff pattern fixes this: the agent does 95% of the work, asks for help on the 5% it can't handle, and resumes without missing a beat. It's practical, it runs in production, and it replaced a workflow that used to break every other week.

If you're a DevOps engineer, SRE, or cloud architect running AI agents and your agents can't touch the web this is worth 30 minutes of your time to test.

Resources

📌 Wrapping Up

Thanks for reading! If this was helpful:

❤️ Like if it added value
💾 Save for later
🔄 Share with your team

Follow me for more on: AWS architecture, FinOps, DevOps, and AI Infrastructure.

👉 Visit my website | Connect on LinkedIn | Email: simplynadaf@gmail.com

Happy Learning 🚀

Top comments (24)

Steven Ray • Jun 16

Interesting approach. How does BrowserAct manage browser session persistence when control is handed over to a human and then returned to the agent? Also, what underlying browser infrastructure or application is used at the backend to maintain the session state without interruption?

Sarvar Nadaf AWS Community Builders • Jun 16

Great question. BrowserAct keeps the same browser session active, allowing the user and agent to seamlessly share context. I believe it uses a persistent remote browser environment, though I'd be interested to hear more details from the BrowserAct team about the underlying architecture.

Steven Ray • Jun 16

Insightful Thanks!

Sarvar Nadaf AWS Community Builders • Jun 16

Your welcome 👍🏻

Parth Hawanna • Jun 16

Very much detailed for beginners and insightful. Thank you Sarvar.

Sarvar Nadaf AWS Community Builders • Jun 16

Thank You Parth!

Sarvar Nadaf AWS Community Builders • Jun 21

Thanks

Salmankhan • Jun 16

Very much detailed for beginners and insightful. Thank you Sarvar.

Sarvar Nadaf AWS Community Builders • Jun 16

Yup Your Welcome!

Mudassir Khan • Jun 17

the 95/5 framing is what I keep coming back to. tried building fully automated scraping for a client last year and it lasted about 3 weeks before an expired MFA token took down the whole pipeline. the answer was always obvious but there was never a clean seam to insert the human step without rewriting the control flow. this does that cleanly. one thing I would want to stress test in prod: the remote assist URL. is it single use? a 1 hour open link with full browser control landing in the wrong Slack channel is a different kind of incident than an expired session

Sarvar Nadaf AWS Community Builders • Jun 17

Completely agree. The human-handoff pattern feels much more resilient than chasing full autonomy. And yes, the security model around the remote assist URL is a key consideration for production use single-use access, expiration, and revocation controls would be important to validate before deploying it in sensitive environments.

Mudassir Khan • Jun 19

yeah exactly — revocation is the one I'd want as the default, not an afterthought. we've seen OAuth integrations where the token revocation endpoint was technically there but nobody had ever called it in prod. same failure mode: the security story is complete on paper but untested under pressure.

the other edge I'd add to the threat model: what happens to the URL if BrowserAct crashes mid session? is it expired on cleanup or does it just linger?

Sarvar Nadaf AWS Community Builders • Jun 19

That's a great point. Security controls are only as good as their behavior during failure scenarios. A crash, network partition, or orphaned session is exactly where I'd expect these mechanisms to be tested. Ideally, the assist URL should be tightly coupled to the browser session lifecycle and be automatically invalidated on termination, timeout, or unexpected failure. I'd be interested to know how BrowserAct handles those edge cases, as that's often where the difference between a demo and an enterprise-ready platform becomes apparent.

Alex • Jun 22

the lifecycle coupling piece is the right test. saw the same gap with OAuth refresh tokens — revoke on logout is in the spec, vendors say they implement it, but actual behavior only shows up when you simulate a force kill mid session. no one tests the failure path until they’re in it.

have you been able to trigger a crash scenario in testing, or is this still a ‘trust the vendor doc’ situation for now?

Sarvar Nadaf AWS Community Builders • Jun 22

Hey Alex,

my testing was focused on validating human handoff workflow rather than performing security or resilience testing so i havent intentionally triggered crash scenario or force terminated session. At this stage I cant confidently speak to how browseract behaves under these conditions. I agree that's an important area to validate before production adoption, and it would be great to hear from the BrowserAct team on how session cleanup, URL invalidation, and failure recovery are handled in practice. let me know if you any other question happy to help you and love to collaborate with you.

Nazar Boyko • Jun 16

The human-handoff pattern is the real takeaway here, separate from the tool. Most automation dies the moment auth shows up, and "agent does 95%, taps a human for the 5% it can't" is a much saner design than chasing full autonomy and watching it break every week on an expired session. Honestly the what could be better section sold me more than any of the marketing would. One thing I'd want to nail down before running this on client infra that remote assist URL hands live browser control to whoever opens it for an hour. How's it scoped single use, IP-bound, revocable if it leaks into the wrong Slack channel

Sarvar Nadaf AWS Community Builders • Jun 16

Thanks for the thoughtful feedback. I completely agree the human-handoff pattern was the most interesting takeaway for me as well. Rather than aiming for 100% autonomy, designing systems that can gracefully involve a human when needed feels much more practical today. Your security question is a very important one. In my testing, I focused primarily on the workflow and user experience, so I haven't yet validated details such as single-use access, IP restrictions, or session revocation capabilities. Those would definitely be critical requirements before adopting a solution like this in enterprise or client environments. Hopefully the BrowserAct team can provide more insight into how those controls are implemented.

Sanket Patharkar • Jun 16

Excellent post! The concept of combining AI-driven browser automation with human oversight is both practical and powerful. Thanks for breaking down the workflow in a way that's easy to understand. Looking forward to seeing how this space evolves.

Sarvar Nadaf AWS Community Builders • Jun 16

Thank you! I completely agree human oversight is a critical piece of making AI agents practical in real-world scenarios. The ability to seamlessly switch between autonomous execution and human intervention opens up many possibilities for enterprise automation. Glad you found the workflow useful!