Most browser automation tutorials skip the hard part: what happens when the site fights back.
You write a clean Playwright script. It works locally. You push it to prod and within 10 minutes you're seeing ERR_ACCESS_DENIED, infinite redirects, or a CAPTCHA that defeats every solver you throw at it.
I've spent the last two months building an AI-powered browser agent that signs up for accounts and fills forms on CAPTCHA-heavy sites. Here's the actual architecture — with real code.
The Problem With Traditional Automation
Most CAPTCHA tutorials treat the challenge as a one-time thing: detect it, solve it, continue. But modern bot protection (PerimeterX, DataDome, Cloudflare) is dynamic. The CAPTCHA is often just the surface layer. The real fingerprinting happens before you ever see a challenge:
- JavaScript canvas fingerprinting
- TLS fingerprint mismatch
- CDP
Runtime.enabledetection - Mouse movement pattern analysis
- Request timing signatures
You can solve the CAPTCHA and still get blocked because your automation fingerprint is already flagged.
The Architecture: Claude Decides, Playwright Executes
The insight that changed everything: treat Claude as the reasoning layer, not the execution layer.
Instead of hardcoding "if CAPTCHA detected, call 2captcha", I give Claude a page snapshot and let it decide what to do next. This means the agent adapts to new blocking patterns without code changes.
Here's the core loop:
import anthropic
import asyncio
from playwright.async_api import async_playwright
client = anthropic.Anthropic()
async def agent_step(page, task: str, history: list) -> dict:
"""Let Claude decide the next browser action."""
snapshot = await page.evaluate("""() => ({
url: window.location.href,
title: document.title,
bodyText: document.body.innerText.slice(0, 3000),
inputs: Array.from(document.querySelectorAll('input,button,select')).map(el => ({
type: el.type,
name: el.name,
id: el.id,
placeholder: el.placeholder,
visible: el.offsetParent !== null
})).slice(0, 20)
})""")
messages = history + [{
"role": "user",
"content": f"Task: {task}\n\nCurrent page state:\n{snapshot}\n\nWhat is the next single action? Reply with JSON: {{action, selector, value, reasoning}}"
}]
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
messages=messages
)
return parse_action(response.content[0].text)
The key is the page snapshot — instead of screenshots (slow, expensive), I extract a structured DOM summary. Claude can reason about it in under a second.
Patching the Browser Fingerprint
PerimeterX and DataDome fingerprint your browser before page load. Standard Playwright gets flagged because of navigator.webdriver = true and missing Chrome-specific globals. This init script runs before every navigation:
// stealth-patches.js — inject via addInitScript
async function patchBrowser(page) {
await page.addInitScript(() => {
// Remove the webdriver flag
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
// Restore Chrome-specific properties PerimeterX checks for
window.chrome = {
runtime: {},
loadTimes: () => {},
csi: () => {},
app: {}
};
// Fake a realistic plugin list
Object.defineProperty(navigator, 'plugins', {
get: () => [
{ name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
{ name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
{ name: 'Native Client', filename: 'internal-nacl-plugin' }
]
});
// Lock language to en-US to avoid locale fingerprinting
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
});
}
This handles initial detection. Mouse movement analysis requires ghost-cursor or similar — random straight-line moves are an instant flag.
The CAPTCHA Decision Tree
When a challenge is detected, the agent runs strategies in priority order and logs every outcome to SQLite:
async def handle_captcha(page, captcha_type: str) -> bool:
strategies = {
'recaptcha_v2': [solve_2captcha, wait_and_retry, request_manual],
'recaptcha_v3': [adjust_behavior_score, change_timing, request_manual],
'hcaptcha': [solve_2captcha, solve_anticaptcha, request_manual],
'perimeterx': [rotate_fingerprint, use_residential_proxy, request_manual],
'cloudflare': [wait_5min_retry, rotate_proxy, request_manual],
}
for strategy in strategies.get(captcha_type, [request_manual]):
result = await strategy(page)
if result.success:
log_strategy_win(captcha_type, strategy.__name__)
return True
log_strategy_fail(captcha_type, strategy.__name__, result.error)
return False
The log_strategy_win / log_strategy_fail calls write to a browser_memory table. Next time the agent runs on the same domain, it reads this history and skips known-failing strategies. The agent literally learns across sessions.
Here's the 2captcha call for reCAPTCHA v2:
async def solve_2captcha(page) -> StrategyResult:
site_key = await page.evaluate("""
() => document.querySelector('[data-sitekey]')?.dataset.sitekey
""")
if not site_key:
return StrategyResult(success=False, error="no sitekey found")
resp = requests.post('http://2captcha.com/in.php', data={
'key': API_KEY,
'method': 'userrecaptcha',
'googlekey': site_key,
'pageurl': page.url
})
task_id = resp.text.split('|')[1]
for _ in range(20):
await asyncio.sleep(3)
res = requests.get(f'http://2captcha.com/res.php?key={API_KEY}&action=get&id={task_id}')
if res.text.startswith('OK|'):
token = res.text.split('|')[1]
await page.evaluate(f"""
document.querySelector('#g-recaptcha-response').value = '{token}';
___grecaptcha_cfg.clients[0].aa.l.callback('{token}');
""")
return StrategyResult(success=True)
return StrategyResult(success=False, error="2captcha timeout")
Results After ~40 Attempts
- PerimeterX sites: 70% bypass rate (30% need residential proxy)
- hCaptcha: 85% automated solve rate via 2captcha
- Cloudflare Bot Management: 60% (IP-dependent)
- DataDome: 40% — still actively debugging
The single biggest unlock: a residential proxy. IP reputation alone accounts for roughly half of all CAPTCHA triggers. A clean IP bypasses most challenges before they even load.
What I Packaged Up
I packaged this into a reusable kit — stealth browser config, CAPTCHA decision tree, browser_memory SQLite schema, proxy rotation, session persistence, and the full Claude agent loop pre-wired together.
If you're building automation agents and want to skip two months of debugging PerimeterX, check out the Claude Browser Agent Starter Kit. The code above is the actual foundation — the kit just handles the plumbing so you can focus on your specific task.
Questions on the architecture or a specific CAPTCHA type you're stuck on? Drop them below.
Top comments (0)