Playwright has quickly become the go-to tool for scraping JavaScript-rendered pages in 2026. If you've been wrestling with Selenium or hitting walls with requests + BeautifulSoup on SPAs, this tutorial will show you why Playwright is worth the switch — and how to use it effectively.
Why Playwright for Web Scraping?
Playwright is an open-source browser automation library built by Microsoft. Compared to Selenium, it offers several meaningful advantages for scraping:
| Feature | Playwright | Selenium |
|---|---|---|
| Auto-wait | Built-in, smart | Manual sleeps required |
| Async support | Native asyncio
|
Bolted on |
| Browser contexts | Lightweight isolation | Full browser per session |
| Speed | Faster | Slower |
| Network interception | First-class | Limited |
The auto-wait feature alone saves hours of debugging flaky scrapers. Playwright waits for elements to be visible and actionable before interacting — no more time.sleep(3) guesswork.
Setup
Install Playwright and its browser binaries:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
This downloads Chromium, Firefox, and WebKit. For most scraping tasks, Chromium is the default.
Here's the basic async Python setup:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Basic Scraping Example
Let's extract the page title and some text content from a real page:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
This gives you structured data from a server-rendered page. But the real power comes with JavaScript-heavy sites.
Handling JavaScript-Rendered Pages
Static HTML scrapers break on React, Vue, and Angular apps because the content is injected by JavaScript after page load. Playwright handles this natively.
Consider scraping a product listing page built with React:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Key methods for JS-heavy pages:
-
wait_until="networkidle"— wait until no network requests for 500ms -
wait_for_selector()— wait for a CSS selector to appear in DOM -
wait_for_load_state("domcontentloaded")— lighter than networkidle
For infinite scroll, you can trigger it programmatically:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Taking Screenshots
Screenshots are invaluable for debugging scrapers and visual verification:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
This is especially useful when your selectors stop matching — a screenshot tells you exactly what the page looks like at scrape time.
Intercepting Network Requests
This is the technique that separates beginner scrapers from pro ones. Most modern SPAs fetch data from a JSON API. Instead of parsing the rendered DOM, you can intercept those API calls directly.
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
The advantage: API responses are clean JSON, not messy HTML. No brittle CSS selectors. This approach is also significantly faster since you're not parsing the DOM.
You can also block requests you don't need to speed things up:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Handling Authentication & Cookies
Many scraping targets require authentication. Playwright lets you save and restore browser state so you don't have to log in on every run:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
This is clean and reliable for sites with JWT tokens, session cookies, or OAuth flows.
Scaling Up
When moving from a one-off script to a production scraper, these patterns matter:
Use browser contexts, not new browsers:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Block unnecessary resources:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Set realistic timeouts:
page = await context.new_page()
page.set_default_timeout(15000) # 15 seconds max per action
Rotate user agents:
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
)
When to Use Managed Solutions
Running Playwright at scale on your own infrastructure means dealing with:
- IP bans and CAPTCHAs
- Proxy rotation and residential IPs
- Browser fingerprinting
- Infrastructure maintenance
For production workloads, managed services take the ops burden off your plate:
ScrapeOps is a scraping operations platform that handles proxy rotation, scheduling, monitoring, and alerting. It integrates cleanly with Playwright scrapers and gives you observability across all your scraping jobs — useful once you're running dozens of scrapers.
ThorData provides residential proxies sourced from real devices, which are significantly harder for anti-bot systems to detect than datacenter IPs. If you're hitting blocks on major e-commerce or social platforms, residential proxies are often the fix.
Apify actors let you run managed Playwright scrapers in the cloud without managing infrastructure. Apify handles browser rendering, scheduling, and output storage — you just write your scraping logic. Good option if you want serverless scale without the DevOps overhead.
Conclusion
Playwright is the right tool for modern web scraping in 2026. The auto-wait behavior eliminates most flakiness, async support makes concurrent scraping clean, and network interception is a genuinely powerful technique that most scrapers overlook.
The path from prototype to production typically goes:
- Playwright script locally → works on the target
- Add error handling, retries, and logging
- Containerize and schedule
- Add proxy rotation when you hit IP limits
- Move to managed infrastructure for scale
Start with the basics from this tutorial, and reach for managed solutions when the ops complexity starts costing more than the service fees. Happy scraping.
Top comments (0)