DEV Community

agenthustler
agenthustler

Posted on

JavaScript Rendering: Puppeteer vs Playwright vs Selenium in 2026

JavaScript Rendering: Puppeteer vs Playwright vs Selenium in 2026

More than 70% of modern websites rely on JavaScript to render content. If your scraper only fetches raw HTML, you are missing most of the data. This guide compares the three major browser automation tools for web scraping in 2026.

The Problem: JavaScript-Rendered Content

import httpx

# This returns an empty shell - data loads via JavaScript
response = httpx.get("https://spa-website.com/products")
print(len(response.text))  # 2KB of loader HTML
# The actual content requires JS execution to appear
Enter fullscreen mode Exit fullscreen mode

You need a real browser engine to execute JavaScript and render the page.

Playwright: The 2026 Default Choice

Playwright has become the de facto standard for browser automation in Python. It supports Chromium, Firefox, and WebKit with a single API.

from playwright.sync_api import sync_playwright
import json

def scrape_spa(url: str) -> list[dict]:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        # Intercept API calls to get structured data directly
        api_responses = []
        def handle_response(response):
            if "/api/products" in response.url:
                api_responses.append(response.json())

        page.on("response", handle_response)
        page.goto(url, wait_until="networkidle")

        # Option 1: Parse from intercepted API responses
        if api_responses:
            return api_responses[0]

        # Option 2: Extract from rendered DOM
        products = page.query_selector_all(".product-card")
        results = []
        for product in products:
            results.append({
                "name": product.query_selector("h3").inner_text(),
                "price": product.query_selector(".price").inner_text(),
            })

        browser.close()
        return results
Enter fullscreen mode Exit fullscreen mode

Playwright Strengths

  • Auto-wait: Automatically waits for elements before interacting
  • Network interception: Capture API calls to skip HTML parsing entirely
  • Multiple browser engines: Chromium, Firefox, WebKit
  • Async-first: Native asyncio support
  • Stealth: Better at evading detection than Selenium

Puppeteer (via Pyppeteer)

Puppeteer is Chrome-only but has a mature ecosystem. The Python port pyppeteer works but lags behind:

import asyncio
from pyppeteer import launch

async def scrape_with_puppeteer(url: str) -> str:
    browser = await launch(
        headless=True,
        args=["--no-sandbox", "--disable-dev-shm-usage"]
    )
    page = await browser.newPage()
    await page.setUserAgent(
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    )

    await page.goto(url, {"waitUntil": "networkidle0"})

    # Wait for specific content to load
    await page.waitForSelector(".data-table", {"timeout": 10000})

    content = await page.content()
    await browser.close()
    return content

result = asyncio.run(scrape_with_puppeteer("https://target.com"))
Enter fullscreen mode Exit fullscreen mode

Puppeteer Strengths

  • Chrome DevTools Protocol: Direct access to Chrome internals
  • Large ecosystem: Many stealth plugins available
  • PDF generation: Best PDF rendering of the three

Puppeteer Weaknesses

  • Chrome only: No Firefox or Safari testing
  • Python port is unofficial: pyppeteer often lags behind Node.js version
  • Maintenance concerns: Less active Python community

Selenium: The Legacy Option

Selenium has been around since 2004. It still works but shows its age:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def scrape_with_selenium(url: str) -> list[dict]:
    options = Options()
    options.add_argument("--headless=new")
    options.add_argument("--disable-blink-features=AutomationControlled")

    driver = webdriver.Chrome(options=options)
    driver.get(url)

    # Explicit waits - Selenium does not auto-wait
    WebDriverWait(driver, 10).until(
        EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".product-card"))
    )

    products = driver.find_elements(By.CSS_SELECTOR, ".product-card")
    results = []
    for product in products:
        results.append({
            "name": product.find_element(By.TAG_NAME, "h3").text,
            "price": product.find_element(By.CLASS_NAME, "price").text,
        })

    driver.quit()
    return results
Enter fullscreen mode Exit fullscreen mode

Selenium Strengths

  • WebDriver standard: W3C standardized protocol
  • Maximum browser support: Chrome, Firefox, Safari, Edge
  • Undetected-chromedriver: Best anti-detection library available

Selenium Weaknesses

  • No auto-wait: Manual waits required everywhere
  • Slower: WebDriver protocol adds overhead
  • Verbose API: More code for the same result

Head-to-Head Comparison

Feature Playwright Puppeteer Selenium
Speed Fast Fast Moderate
Auto-wait Yes Partial No
Network intercept Excellent Good Limited
Multi-browser Yes Chrome only Yes
Python support Official Unofficial Official
Anti-detection Good Good Best (UC)
Async support Native Native No
Memory usage Moderate Moderate High
Learning curve Low Low Medium

When to Skip Browser Automation Entirely

Browser automation is resource-intensive. Before reaching for Playwright, check if a scraping API can handle rendering for you. ScraperAPI supports JavaScript rendering — just add render=true to your request and get back fully rendered HTML without managing browsers.

For sites behind anti-bot protection, pairing a rendering API with residential proxies from ThorData often achieves better success rates than running your own browser fleet.

Performance Optimization Tips

from playwright.sync_api import sync_playwright

def optimized_scrape(urls: list[str]) -> list[str]:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context()

        # Block unnecessary resources to speed things up
        context.route("**/*.{png,jpg,jpeg,gif,svg,woff,woff2}", lambda route: route.abort())
        context.route("**/analytics**", lambda route: route.abort())
        context.route("**/ads**", lambda route: route.abort())

        results = []
        for url in urls:
            page = context.new_page()
            page.goto(url, wait_until="domcontentloaded")  # Faster than networkidle
            results.append(page.content())
            page.close()  # Reuse context, close pages

        browser.close()
        return results
Enter fullscreen mode Exit fullscreen mode

Recommendation

Use Playwright unless you have a specific reason not to. It has the best API, official Python support, and hits the sweet spot between power and ease of use. Use ScrapeOps to benchmark whether a proxy API is more cost-effective than running your own browser farm at your specific scale.

For anti-detection specifically, Selenium with undetected-chromedriver remains the gold standard — but Playwright with stealth plugins is closing the gap fast.

Top comments (0)