DEV Community

Cover image for How to Scrape JavaScript Websites with Playwright (Using Proxies)
Annabelle
Annabelle

Posted on • Edited on

How to Scrape JavaScript Websites with Playwright (Using Proxies)

To scrape JavaScript-heavy websites using Playwright with proxies, launch a browser instance by passing a proxy object into the launch method. This object should include the server URL and optional username and password. Use page.goto() to navigate, as Playwright automatically waits for dynamic content to render before extraction.

Example (Node.js):

const browser = await chromium.launch({
  proxy: {
    server: 'http://myproxy.com:8080',
    username: 'user',
    password: 'pwd'
  }
});
const page = await browser.newPage();
await page.goto('https://example.com');
Enter fullscreen mode Exit fullscreen mode

What is Playwright and why use it for scraping?

Playwright is a browser automation tool that allows you to interact with websites just like a real user. It’s especially useful for scraping JavaScript-heavy websites where content is loaded dynamically.

If you’ve tried scraping modern websites using requests, you’ve probably noticed:

  • Missing data
  • Empty HTML
  • Incomplete page content

That’s because many websites render content using JavaScript.

If you're still working with basic HTTP requests, this guide on how to rotate proxies in Python for reliable data collection explains how to handle proxy rotation before moving to browser-based scraping.

Why do traditional scraping methods fail on JavaScript sites?

Traditional scraping fails because tools like requests only fetch raw HTML and do not execute JavaScript.

Modern websites rely on:

  • Client-side rendering
  • API calls triggered by JavaScript
  • Dynamic content loading

Without executing JavaScript, you won’t see the actual data.

How do you install Playwright in Python?

You can install Playwright with:

pip install playwright
playwright install
Enter fullscreen mode Exit fullscreen mode

How do you scrape a page using Playwright?

Here’s a simple example:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    page.goto("https://example.com")
    content = page.content()

    print(content)

    browser.close()
Enter fullscreen mode Exit fullscreen mode

This loads the page in a real browser environment.

How do you wait for dynamic content?

You can wait for elements to load before extracting data.

page.goto("https://example.com")

page.wait_for_selector("div.product")

data = page.locator("div.product").all_text_contents()
print(data)
Enter fullscreen mode Exit fullscreen mode

This ensures you’re scraping fully rendered content.

How do you use proxies with Playwright?

You can configure a proxy when launching the browser.

browser = p.chromium.launch(
    proxy={
        "server": "http://username:password@proxy-ip:port"
    }
)
Enter fullscreen mode Exit fullscreen mode

This routes all traffic through a proxy.

If you're evaluating different options, many developers compare the best US residential proxy providers based on reliability, geographic targeting, and success rate.

How do you rotate proxies in Playwright?

Playwright doesn’t rotate proxies automatically, you need to manage it.

Example:

import random

proxy_list = [
    "http://user:pass@ip1:port",
    "http://user:pass@ip2:port",
    "http://user:pass@ip3:port"
]

def get_proxy():
    return random.choice(proxy_list)

with sync_playwright() as p:
    proxy = get_proxy()

    browser = p.chromium.launch(
        proxy={"server": proxy}
    )

    page = browser.new_page()
    page.goto("https://example.com")

    print(page.content())

    browser.close()
Enter fullscreen mode Exit fullscreen mode

How do you avoid detection when scraping?

To reduce detection:

  • Rotate proxies
  • Use realistic user agents
  • Add delays between actions
  • Avoid aggressive scraping patterns

Example:

page.wait_for_timeout(2000)
Enter fullscreen mode Exit fullscreen mode

How do you scale Playwright scraping?

For larger systems:

  • Use multiple browser instances
  • Distribute tasks across workers
  • Combine with proxy rotation
  • Implement retry logic

This builds a more reliable scraping system.

FAQs

Is Playwright better than Selenium?

Playwright is faster and more modern, with better support for handling dynamic content.

Can Playwright handle CAPTCHAs?

Not directly. You’ll need external services or manual solving.

Do I always need proxies with Playwright?

Not always, but for large-scale scraping, proxies become essential.

Is scraping JavaScript websites legal?

It depends on how you use the data and the website’s terms of service.

Final Thoughts

Modern websites rely heavily on JavaScript, which makes traditional scraping methods less effective.

Playwright solves this by simulating real browser behavior.

When combined with proxy rotation and proper request handling, it becomes a powerful tool for reliable data collection.

Top comments (0)