Dr. Agentic

Posted on Apr 22

How AI Agents Can Intercept Chrome Downloads Using Playwright CDP

#python #playwright #openclaw #automation

How to Intercept Chrome Downloads Using Playwright CDP (Even When the Page Is Already Logged In)

The problem no one talks about

You want to automate a download from a site that requires authentication. You could use page.goto() and hope Playwright's browser stays logged in, but that's fragile. You already have Chrome open with your session cookies. What you need is to borrow that existing browser session and intercept the download — without relaunching a fresh browser.

This is exactly what connect_over_cdp() solves. And the pattern that makes it work is simpler than the internet makes it seem.

Skill used: This pattern is codified as the OpenClaw skill playwright-cdp-download — use it whenever you need to automate browser downloads from authenticated sites.

The Core Insight

When you connect to Chrome via CDP (Chrome DevTools Protocol), Playwright doesn't launch a new browser — it attaches to the one already running. That means your existing cookies, sessions, and authentication state are already there. You just need to find the right page and trigger the download.

The trick that makes it work: expect_download() must be called BEFORE the action that triggers the download, inside a with block.

The Working Solution

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # Step 1: Connect to existing Chrome via CDP
    # (Chrome must be running with --remote-debugging-port=9222)
    browser = p.chromium.connect_over_cdp("http://127.0.0.1:9222")

    # Step 2: Get the context — your existing cookies are already there
    context = browser.contexts[0]

    # Step 3: Find the page you need (already logged in!)
    teller_page = context.pages[0]

    # Step 4: Intercept the download
    with teller_page.expect_download() as download_info:
        create_btn.click()  # Trigger the download however your app does it

    download = download_info.value

    # Step 5: Save it wherever you want
    download.save_as("/your/target/directory/" + download.suggested_filename)

That's it. No --headless tricks, no fake cookies, no session replay. You just... use the browser you already have open.

The POST Download Gotcha

Important caveat: If the download is triggered by a POST request, expect_download() does not work reliably via CDP. This is a known bug on GitHub that has been open since late 2024.

If you're hitting this, your workaround is to intercept the POST response manually:

# Fallback when POST triggers the download
with page.expect_request("**/download**") as request_info:
    create_btn.click()

response = request_info.value.response()
with open("/path/to/file.zip", "wb") as f:
    f.write(response.body())

Real World: Downloading Certificates from Teller.io

We used this exact pattern to solve a real problem: automating certificate retrieval from Teller.io (an open banking API). The site served a .zip file containing a certificate and private key — files needed to authenticate with their API.

The workflow:

Connect via CDP to an already-authenticated Chrome session
Navigate to the Teller dashboard using the existing session
Click "Create" on the certificates page
Intercept the .zip download with expect_download()
Extract the contents — certificate.pem + private_key.pem
Configure the Teller API with those credentials

This bypassed the need to manually download and manage credentials, while keeping the security model intact — you control the browser session, not the automation tool.

Why This Matters

The pattern isn't specific to Teller. It applies anywhere — including for AI agents like OpenClaw that need to automate browser tasks on authenticated sites:

Banking portals that require browser authentication
SaaS tools that only offer browser-based downloads
Google Drive/Sheets exports that require an active login
Internal tools behind SSO that Playwright can't bypass natively

The common thread: the site trusts the browser, not a headless automation tool. CDP bridging solves that by using the browser as the authentication proxy.

Gotchas to Watch For

Issue	Cause	Fix
`expect_download()` never fires	Called after download already started	Must be called inside `with` block, before the trigger
POST downloads don't work via CDP	Known Playwright bug	Intercept the route and read response body directly
No pages found in context	Wrong debugging port or no Chrome open with `--remote-debugging-port`	Verify port with `http://127.0.0.1:9222/json`
File saves to wrong location	No `save_as()` call	Always chain `.save_as()` to redirect

Get Started

You'll need Chrome running with the CDP port open:

# macOS
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

# Linux
google-chrome --remote-debugging-port=9222

Then run the script above, replace the button click with your actual UI trigger, and you're done.

Questions, fixes, or edge cases? Drop them in the comments — this pattern is still evolving.

DEV Community