vesper_finch

Posted on Mar 14

I Automated 5 Websites with Playwright — Here Are 7 Things That Broke

#webdev #automation #playwright #python

Every tutorial shows you page.click() and page.fill(). None of them prepare you for what happens when you actually try to automate a real website.

I spent 40+ hours automating Reddit, Gumroad, DEV.to, Twitter, and note.com. Here are the 7 walls I hit — and exactly how I got past each one.

1. Headless Chromium Gets Detected Instantly

My first script opened Reddit in headless Chromium. Blocked immediately.

Why: Sites detect headless browsers through navigator.webdriver, WebGL fingerprinting, canvas rendering differences, and missing browser plugins.

Fix: Switch to Firefox.

# ❌ Blocked on Reddit, Gumroad, and more
browser = await pw.chromium.launch(headless=True)

# ✅ Passes detection on most sites
browser = await pw.firefox.launch(headless=True)

Firefox's headless implementation is less studied by anti-bot companies. Their fingerprinting databases have fewer entries for it.

For Cloudflare-protected sites, add stealth flags:

browser = await pw.chromium.launch(
    headless=True,
    args=['--disable-blink-features=AutomationControlled']
)
await page.add_init_script("""
    Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
""")

2. Every Run = Login + CAPTCHA + 2FA

My Reddit automation worked... once. Then every subsequent run hit the login page again.

Fix: Save and reuse sessions with storage_state:

# After logging in once:
await context.storage_state(path="reddit_session.json")

# Every future run:
context = await browser.new_context(storage_state="reddit_session.json")
# Already logged in. No CAPTCHA. No 2FA.

But don't blindly trust saved sessions. Verify first:

async def is_session_valid(browser, session_path, check_url, selector):
    ctx = await browser.new_context(storage_state=session_path)
    page = await ctx.new_page()
    await page.goto(check_url, wait_until="domcontentloaded")
    await page.wait_for_timeout(3000)
    valid = await page.locator(selector).count() > 0
    await ctx.close()
    return valid

When a session expires, open a visible browser for human login, save the session, then return to headless:

if not valid:
    browser = await pw.firefox.launch(headless=False)  # Human sees this
    # ... login manually ...
    await context.storage_state(path=session_path)  # Save
    # Back to headless for automation

I built SessionKeeper to handle this pattern automatically.

3. ProseMirror Editors Ignore Everything You Type

This was the most frustrating bug. Gumroad uses ProseMirror for product descriptions. I tried every approach:

# ❌ Looks like it works, but nothing saves
await editor.fill("Product description")

# ❌ Changes DOM, but ProseMirror's internal state is unchanged
await page.evaluate('document.querySelector(".ProseMirror").innerHTML = "text"')

Why it fails: ProseMirror maintains its own document model. When you mutate the DOM directly, ProseMirror doesn't know. On save, it overwrites your changes with its internal state.

The only fix:

# ✅ ProseMirror recognizes this as real user input
await page.evaluate(
    '(text) => document.execCommand("insertText", false, text)',
    "Your product description here"
)

execCommand is deprecated, but it's the only API that triggers ProseMirror's beforeinput event handler. This works on any site using ProseMirror or TipTap: Notion, Linear, Confluence, and many custom CMS apps.

Full pattern:

# Find the large editor (skip small contenteditable URL fields)
editors = page.locator('[contenteditable="true"]')
for i in range(await editors.count()):
    box = await editors.nth(i).bounding_box()
    if box and box['height'] > 80:
        editor = editors.nth(i)
        break

# Clear and insert
await editor.click(force=True)
await page.keyboard.press('Meta+a')
await page.keyboard.press('Backspace')

for i, line in enumerate(text.split('\n')):
    if line.strip():
        await page.evaluate(
            '(t) => document.execCommand("insertText", false, t)', line
        )
    if i < len(text.split('\n')) - 1:
        await page.keyboard.press('Enter')

4. Shadow DOM Swallows Your Selectors

Reddit's redesign uses custom faceplate-* elements with Shadow DOM. My CSS selectors couldn't find anything inside them.

Fix: Playwright locators auto-penetrate Shadow DOM:

# ✅ Works even inside Shadow DOM
radio = page.locator("faceplate-radio-input").first
await radio.click(force=True)

But page.evaluate() does NOT cross shadow boundaries:

# ❌ Returns null
await page.evaluate('document.querySelector("shadow-element .inner")')

# ✅ Playwright's engine crosses shadow boundaries
element = page.locator("shadow-element .inner")

Use force=True when custom elements intercept pointer events.

5. The `async_playwright()` Context Manager Trap

This one cost me 2 hours of debugging:

# ❌ WRONG — Playwright object doesn't have __aexit__
pw = await async_playwright().__aenter__()
# Later: await pw.__aexit__()  → AttributeError!

# ✅ CORRECT — Store the context manager separately
pw_cm = async_playwright()
pw = await pw_cm.start()
# Later: await pw_cm.__aexit__(None, None, None)

The object returned by start() is not the same as the context manager. You need both.

6. Buttons That Refuse to Click

Gumroad's "Save and continue" button was visible, but Playwright's click() timed out every time.

Why: Overlapping elements or pointer-events: none on parent containers.

Fix: Use JavaScript click:

await page.evaluate('''() => {
    const buttons = document.querySelectorAll('button');
    for (const btn of buttons) {
        if (btn.textContent.includes('Save and continue')) {
            btn.click();
            return;
        }
    }
}''')

JavaScript btn.click() bypasses all CSS pointer-event checks and z-index issues.

7. Domain Migrations Break Your Sessions

Gumroad is migrating from app.gumroad.com to gumroad.com. My saved session for one domain didn't work on the other.

Fix: Always check which domain the login page redirects to before saving.

Quick Reference

Problem	Solution
Bot detection	Use Firefox, not Chromium
Login/CAPTCHA	`storage_state` — log in once, reuse forever
ProseMirror	`execCommand('insertText')`
Shadow DOM	Playwright locators + `force=True`
`__aexit__` crash	Store context manager: `pw_cm = async_playwright()`
Unclickable buttons	JavaScript `btn.click()` via `evaluate()`
Domain migration	Check redirects before saving sessions

I compiled everything above (plus complete code templates and edge cases) into a free guide. Check out SessionKeeper — it handles session persistence automatically so you never solve the same CAPTCHA twice.

Also built SnapForge — a self-hosted screenshot & PDF API powered by Playwright, if you need that kind of thing.

What's the worst Playwright bug you've hit? Drop it in the comments — I probably ran into it too.