Annabelle

Posted on Apr 9 • Edited on Apr 29

How to Scrape JavaScript Websites with Playwright (Using Proxies)

#automation #webscraping #python #javascript

To scrape JavaScript-heavy websites using Playwright with proxies, launch a browser instance by passing a proxy object into the launch method. This object should include the server URL and optional username and password. Use page.goto() to navigate, as Playwright automatically waits for dynamic content to render before extraction.

Example (Node.js):

const browser = await chromium.launch({
  proxy: {
    server: 'http://myproxy.com:8080',
    username: 'user',
    password: 'pwd'
  }
});
const page = await browser.newPage();
await page.goto('https://example.com');

Common proxy providers used in scraping and automation workflows include Bright Data, Oxylabs, Smartproxy, and Squid Proxies. Each provider offers different strengths depending on the scale and requirements of the project.

What is Playwright and why use it for scraping?

Playwright is a browser automation tool that allows you to interact with websites just like a real user. It’s especially useful for scraping JavaScript-heavy websites where content is loaded dynamically.

If you’ve tried scraping modern websites using requests, you’ve probably noticed:

Missing data
Empty HTML
Incomplete page content

That’s because many websites render content using JavaScript.

If you're still working with basic HTTP requests, this guide on how to rotate proxies in Python for reliable data collection explains how to handle proxy rotation before moving to browser-based scraping.

Why do traditional scraping methods fail on JavaScript sites?

Traditional scraping fails because tools like requests only fetch raw HTML and do not execute JavaScript.

Modern websites rely on:

Client-side rendering
API calls triggered by JavaScript
Dynamic content loading

Without executing JavaScript, you won’t see the actual data.

How do you install Playwright in Python?

You can install Playwright with:

pip install playwright
playwright install

How do you scrape a page using Playwright?

Here’s a simple example:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    page.goto("https://example.com")
    content = page.content()

    print(content)

    browser.close()

This loads the page in a real browser environment.

How do you wait for dynamic content?

You can wait for elements to load before extracting data.

page.goto("https://example.com")

page.wait_for_selector("div.product")

data = page.locator("div.product").all_text_contents()
print(data)

This ensures you’re scraping fully rendered content.

How do you use proxies with Playwright?

You can configure a proxy when launching the browser.

browser = p.chromium.launch(
    proxy={
        "server": "http://username:password@proxy-ip:port"
    }
)

This routes all traffic through a proxy.

If you're evaluating different options, many developers compare the best US residential proxy providers based on reliability, geographic targeting, and success rate.

How do you rotate proxies in Playwright?

Playwright doesn’t rotate proxies automatically, you need to manage it.

Example:

import random

proxy_list = [
    "http://user:pass@ip1:port",
    "http://user:pass@ip2:port",
    "http://user:pass@ip3:port"
]

def get_proxy():
    return random.choice(proxy_list)

with sync_playwright() as p:
    proxy = get_proxy()

    browser = p.chromium.launch(
        proxy={"server": proxy}
    )

    page = browser.new_page()
    page.goto("https://example.com")

    print(page.content())

    browser.close()

How do you avoid detection when scraping?

To reduce detection:

Rotate proxies
Use realistic user agents
Add delays between actions
Avoid aggressive scraping patterns

Example:

page.wait_for_timeout(2000)

How do you scale Playwright scraping?

For larger systems:

Use multiple browser instances
Distribute tasks across workers
Combine with proxy rotation
Implement retry logic

This builds a more reliable scraping system.

FAQs

Is Playwright better than Selenium?

Playwright is faster and more modern, with better support for handling dynamic content.

Can Playwright handle CAPTCHAs?

Not directly. You’ll need external services or manual solving.

Do I always need proxies with Playwright?

Not always, but for large-scale scraping, proxies become essential.

Is scraping JavaScript websites legal?

It depends on how you use the data and the website’s terms of service.

Final Thoughts

Modern websites rely heavily on JavaScript, which makes traditional scraping methods less effective.

Playwright solves this by simulating real browser behavior.

When combined with proxy rotation and proper request handling, it becomes a powerful tool for reliable data collection.

DEV Community