Sourabh Katti

Posted on Dec 2, 2025

Building AI browser agents that actually work: lessons from automating password changes

#ai #python #automation #showdev

Everyone talks about AI agents coding, writing, and chatting. But what about agents that actually do things in the real world - like navigating websites, clicking buttons, and filling forms?

I spent the last few months building an AI agent that automates password changes across websites. Here's what I learned about making browser automation agents that actually work in production.

The problem with traditional browser automation

Selenium and Playwright are great for predictable workflows. But real websites are messy:

Login flows change without warning
CAPTCHAs appear randomly
Elements have dynamic IDs
Modals pop up unexpectedly
Sites detect and block automation

Traditional scripted automation breaks constantly. You end up playing whack-a-mole with selectors.

Enter AI browser agents

The idea is simple: instead of scripting every click, let an LLM observe the page and decide what to do next. Tools like browser-use make this surprisingly accessible.

Here's the basic architecture:

from browser_use import Agent, Browser
from langchain_openai import ChatOpenAI

# Initialize browser and LLM
browser = Browser()
llm = ChatOpenAI(model="gpt-4o-mini")

# Create agent with a task
agent = Agent(
    task="Navigate to example.com and click the login button",
    llm=llm,
    browser=browser
)

# Run it
await agent.run()

The agent takes screenshots, converts them to an accessibility tree, and asks the LLM: "Given this page state and your goal, what should you do next?"

Five hard lessons from production

1. Constrain the agent ruthlessly

My first agents were too open-ended. I'd say "change the password on this site" and watch in horror as the agent:

Opened new tabs to search for help articles
Clicked "Forgot Password" links
Navigated to completely unrelated pages

Fix: Add strict rules to your prompts:

task = """
Change the password on example.com.

STRICT RULES:
- DO NOT open new tabs
- DO NOT use search engines
- DO NOT click "Forgot Password"
- If stuck after 5 actions, STOP and report failure
"""

2. Never expose sensitive data to the LLM

This was my biggest security mistake. Early versions included passwords in the task prompt:

# DON'T DO THIS
task = f"Log in with password: {actual_password}"

That password now lives in your LLM provider's logs forever.

Fix: Use custom actions that inject credentials without exposing them to the model:

@browser.action("Enter password in focused field")
def enter_password(credentials: dict):
    # Password passed through secure channel
    # LLM only sees action name, never the value
    page.keyboard.type(credentials["password"])

3. The DOM is not your friend

Screen readers and accessibility trees are your friends. They provide semantic meaning that raw HTML doesn't.

# Instead of parsing HTML for buttons
soup.find_all('button', class_='submit-btn-v2')

# Use accessibility snapshots
snapshot = page.accessibility.snapshot()
# Returns structured data: button "Submit", textbox "Email", etc.

This makes your agent more robust to CSS class changes and layout shifts.

4. Implement aggressive timeouts

AI agents can get stuck in loops. Without timeouts, they'll burn through your API budget clicking the same broken element forever.

agent = Agent(
    task=task,
    llm=llm,
    max_steps=15,           # Hard limit on actions
    timeout=120,            # Total time limit
    step_timeout=30,        # Per-action limit
)

5. Log everything (but redact credentials)

When things go wrong (and they will), you need visibility. But you also can't log passwords.

def log_action(action: str, target: str):
    # Redact anything that looks sensitive
    if "password" in target.lower():
        target = "[REDACTED]"
    logger.info(f"Action: {action} -> {target}")

When AI agents beat traditional automation

AI browser agents shine when:

Sites change frequently - The agent adapts to new layouts
Workflows vary by account - Different users see different flows
Edge cases are endless - 2FA prompts, CAPTCHA, cookie banners
You'd rather not maintain selectors - Let the AI figure it out

They struggle when:

Speed matters - LLM calls add latency
Cost matters - Each action = API call
Reliability must be 100% - AI agents are probabilistic

The architecture that worked

After many iterations, here's what works:

Playwright for browser control - Fast, reliable, good DevTools integration
GPT-4o-mini for decisions - Good enough for navigation, much cheaper than GPT-4
Accessibility tree for page state - More stable than DOM parsing
Custom actions for sensitive operations - Keep credentials out of LLM context
Aggressive constraints - Limit what the agent can do

Real-world results

Building The Password App, I've now run thousands of password change operations. Success rates:

Simple sites (basic forms): ~95%
Complex sites (multi-step, 2FA): ~70%
Anti-bot protected sites: ~40%

The 40% is brutal but honest. Sites with Cloudflare, DataDome, or aggressive bot detection still win most battles.

What's next for AI browser agents

We're still early. Current limitations:

CAPTCHA - The eternal enemy
Bot detection - Getting harder to evade
Cost - $0.01-0.05 per operation adds up
Latency - 30-60 seconds for simple tasks

But the trajectory is clear. As vision models improve and costs drop, AI browser agents will handle increasingly complex web tasks.

If you're building AI browser automation, I'd love to hear what's working for you. Drop a comment or find me on Twitter.

Building something similar? The Password App automates password changes using the techniques described here. Free tier available.

Top comments (1)

Playwright Weekly • Dec 3 '25

Thanks for sharing! I'll include it on playwrightweekly.com next digest :)