agenthustler

Posted on Mar 26

How to Scrape Dynamic Websites with JavaScript in 2026

#webdev #python #webscraping #tutorial

Dynamic websites built with React, Vue, Angular, and other JavaScript frameworks present unique challenges for web scraping. Unlike traditional server-rendered pages, these Single Page Applications (SPAs) load content asynchronously, making simple HTTP requests insufficient.

In this guide, I'll show you practical techniques to scrape dynamic websites effectively in 2026.

Why Dynamic Sites Are Hard to Scrape

When you fetch a React or Vue page with requests, you get a nearly empty HTML shell:

import requests

response = requests.get("https://example-spa.com/products")
print(response.text)
# <div id="root"></div> — no actual content!

The content loads via JavaScript after the initial page load. You need different strategies.

Strategy 1: Find the Hidden API

Most SPAs fetch data from REST or GraphQL APIs. Open DevTools → Network tab → filter by XHR/Fetch:

import requests

# The actual API endpoint behind the SPA
api_url = "https://example-spa.com/api/v2/products?page=1&limit=50"
headers = {"Accept": "application/json"}

response = requests.get(api_url, headers=headers)
data = response.json()

for product in data["results"]:
    print(f"{product['name']} — ${product['price']}")

This is the fastest and most reliable approach. Always check for APIs first.

Strategy 2: Extract `window.__INITIAL_STATE__`

Many frameworks embed pre-rendered data in the HTML as a JavaScript variable:

import requests
import re
import json

response = requests.get("https://example-spa.com/products")
html = response.text

# Extract the embedded state object
match = re.search(r'window\.__INITIAL_STATE__\s*=\s*({.*?});', html, re.DOTALL)
if match:
    state = json.loads(match.group(1))
    products = state["catalog"]["products"]
    for p in products:
        print(f"{p['title']} — {p['price']}")

Look for variations like __NEXT_DATA__ (Next.js), __NUXT__ (Nuxt.js), or window.__PRELOADED_STATE__.

Strategy 3: Browser Automation with Playwright

When there's no API and no embedded state, use a headless browser:

from playwright.sync_api import sync_playwright

def scrape_dynamic_page(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")

        # Wait for specific content to load
        page.wait_for_selector(".product-card")

        products = page.query_selector_all(".product-card")
        results = []
        for product in products:
            name = product.query_selector(".name").inner_text()
            price = product.query_selector(".price").inner_text()
            results.append({"name": name, "price": price})

        browser.close()
        return results

data = scrape_dynamic_page("https://example-spa.com/products")
for item in data:
    print(item)

Strategy 4: Use a Scraping API

For production workloads, a scraping API handles JavaScript rendering, proxy rotation, and CAPTCHAs for you:

import requests

# ScraperAPI renders JavaScript automatically
params = {
    "api_key": "YOUR_KEY",
    "url": "https://example-spa.com/products",
    "render": "true"
}

response = requests.get("https://api.scraperapi.com", params=params)
html = response.text  # Fully rendered HTML with all dynamic content

Try ScraperAPI free — 5,000 API credits

Handling Infinite Scroll

Many SPAs use infinite scroll instead of pagination:

from playwright.sync_api import sync_playwright
import time

def scrape_infinite_scroll(url, max_scrolls=10):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")

        previous_height = 0
        for i in range(max_scrolls):
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            time.sleep(2)

            current_height = page.evaluate("document.body.scrollHeight")
            if current_height == previous_height:
                break  # No more content to load
            previous_height = current_height

        # Now extract all loaded content
        items = page.query_selector_all(".item")
        results = [el.inner_text() for el in items]
        browser.close()
        return results

Performance Tips

Always try the API first — it's 10-100x faster than browser automation
Check for __INITIAL_STATE__ before launching a browser
Block unnecessary resources (images, fonts, analytics) in Playwright to speed up rendering
Use connection pooling with requests.Session() for API-based scraping
Respect rate limits — add delays between requests

Conclusion

Dynamic websites require smarter scraping strategies, but they're far from impossible. Start with API discovery, check for embedded state data, and only fall back to browser automation when needed. For production-scale scraping, ScraperAPI handles the complexity of JavaScript rendering and proxy management so you can focus on the data.

Happy scraping!

DEV Community

How to Scrape Dynamic Websites with JavaScript in 2026

Why Dynamic Sites Are Hard to Scrape

Strategy 1: Find the Hidden API

Strategy 2: Extract `window.__INITIAL_STATE__`

Strategy 3: Browser Automation with Playwright

Strategy 4: Use a Scraping API

Handling Infinite Scroll

Performance Tips

Conclusion

Top comments (0)

Why Dynamic Sites Are Hard to Scrape

Strategy 1: Find the Hidden API

Strategy 2: Extract window.__INITIAL_STATE__

Strategy 3: Browser Automation with Playwright

Strategy 4: Use a Scraping API

Handling Infinite Scroll

Performance Tips

Conclusion

Strategy 2: Extract `window.__INITIAL_STATE__`