JavaScript Rendering: Puppeteer vs Playwright vs Selenium in 2026
More than 70% of modern websites rely on JavaScript to render content. If your scraper only fetches raw HTML, you are missing most of the data. This guide compares the three major browser automation tools for web scraping in 2026.
The Problem: JavaScript-Rendered Content
import httpx
# This returns an empty shell - data loads via JavaScript
response = httpx.get("https://spa-website.com/products")
print(len(response.text)) # 2KB of loader HTML
# The actual content requires JS execution to appear
You need a real browser engine to execute JavaScript and render the page.
Playwright: The 2026 Default Choice
Playwright has become the de facto standard for browser automation in Python. It supports Chromium, Firefox, and WebKit with a single API.
from playwright.sync_api import sync_playwright
import json
def scrape_spa(url: str) -> list[dict]:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Intercept API calls to get structured data directly
api_responses = []
def handle_response(response):
if "/api/products" in response.url:
api_responses.append(response.json())
page.on("response", handle_response)
page.goto(url, wait_until="networkidle")
# Option 1: Parse from intercepted API responses
if api_responses:
return api_responses[0]
# Option 2: Extract from rendered DOM
products = page.query_selector_all(".product-card")
results = []
for product in products:
results.append({
"name": product.query_selector("h3").inner_text(),
"price": product.query_selector(".price").inner_text(),
})
browser.close()
return results
Playwright Strengths
- Auto-wait: Automatically waits for elements before interacting
- Network interception: Capture API calls to skip HTML parsing entirely
- Multiple browser engines: Chromium, Firefox, WebKit
- Async-first: Native asyncio support
- Stealth: Better at evading detection than Selenium
Puppeteer (via Pyppeteer)
Puppeteer is Chrome-only but has a mature ecosystem. The Python port pyppeteer works but lags behind:
import asyncio
from pyppeteer import launch
async def scrape_with_puppeteer(url: str) -> str:
browser = await launch(
headless=True,
args=["--no-sandbox", "--disable-dev-shm-usage"]
)
page = await browser.newPage()
await page.setUserAgent(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
)
await page.goto(url, {"waitUntil": "networkidle0"})
# Wait for specific content to load
await page.waitForSelector(".data-table", {"timeout": 10000})
content = await page.content()
await browser.close()
return content
result = asyncio.run(scrape_with_puppeteer("https://target.com"))
Puppeteer Strengths
- Chrome DevTools Protocol: Direct access to Chrome internals
- Large ecosystem: Many stealth plugins available
- PDF generation: Best PDF rendering of the three
Puppeteer Weaknesses
- Chrome only: No Firefox or Safari testing
-
Python port is unofficial:
pyppeteeroften lags behind Node.js version - Maintenance concerns: Less active Python community
Selenium: The Legacy Option
Selenium has been around since 2004. It still works but shows its age:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def scrape_with_selenium(url: str) -> list[dict]:
options = Options()
options.add_argument("--headless=new")
options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(options=options)
driver.get(url)
# Explicit waits - Selenium does not auto-wait
WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".product-card"))
)
products = driver.find_elements(By.CSS_SELECTOR, ".product-card")
results = []
for product in products:
results.append({
"name": product.find_element(By.TAG_NAME, "h3").text,
"price": product.find_element(By.CLASS_NAME, "price").text,
})
driver.quit()
return results
Selenium Strengths
- WebDriver standard: W3C standardized protocol
- Maximum browser support: Chrome, Firefox, Safari, Edge
- Undetected-chromedriver: Best anti-detection library available
Selenium Weaknesses
- No auto-wait: Manual waits required everywhere
- Slower: WebDriver protocol adds overhead
- Verbose API: More code for the same result
Head-to-Head Comparison
| Feature | Playwright | Puppeteer | Selenium |
|---|---|---|---|
| Speed | Fast | Fast | Moderate |
| Auto-wait | Yes | Partial | No |
| Network intercept | Excellent | Good | Limited |
| Multi-browser | Yes | Chrome only | Yes |
| Python support | Official | Unofficial | Official |
| Anti-detection | Good | Good | Best (UC) |
| Async support | Native | Native | No |
| Memory usage | Moderate | Moderate | High |
| Learning curve | Low | Low | Medium |
When to Skip Browser Automation Entirely
Browser automation is resource-intensive. Before reaching for Playwright, check if a scraping API can handle rendering for you. ScraperAPI supports JavaScript rendering — just add render=true to your request and get back fully rendered HTML without managing browsers.
For sites behind anti-bot protection, pairing a rendering API with residential proxies from ThorData often achieves better success rates than running your own browser fleet.
Performance Optimization Tips
from playwright.sync_api import sync_playwright
def optimized_scrape(urls: list[str]) -> list[str]:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context()
# Block unnecessary resources to speed things up
context.route("**/*.{png,jpg,jpeg,gif,svg,woff,woff2}", lambda route: route.abort())
context.route("**/analytics**", lambda route: route.abort())
context.route("**/ads**", lambda route: route.abort())
results = []
for url in urls:
page = context.new_page()
page.goto(url, wait_until="domcontentloaded") # Faster than networkidle
results.append(page.content())
page.close() # Reuse context, close pages
browser.close()
return results
Recommendation
Use Playwright unless you have a specific reason not to. It has the best API, official Python support, and hits the sweet spot between power and ease of use. Use ScrapeOps to benchmark whether a proxy API is more cost-effective than running your own browser farm at your specific scale.
For anti-detection specifically, Selenium with undetected-chromedriver remains the gold standard — but Playwright with stealth plugins is closing the gap fast.
Top comments (0)