DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Dynamic Websites with JavaScript in 2026

Dynamic websites built with React, Vue, Angular, and other JavaScript frameworks present unique challenges for web scraping. Unlike traditional server-rendered pages, these Single Page Applications (SPAs) load content asynchronously, making simple HTTP requests insufficient.

In this guide, I'll show you practical techniques to scrape dynamic websites effectively in 2026.

Why Dynamic Sites Are Hard to Scrape

When you fetch a React or Vue page with requests, you get a nearly empty HTML shell:

import requests

response = requests.get("https://example-spa.com/products")
print(response.text)
# <div id="root"></div> — no actual content!
Enter fullscreen mode Exit fullscreen mode

The content loads via JavaScript after the initial page load. You need different strategies.

Strategy 1: Find the Hidden API

Most SPAs fetch data from REST or GraphQL APIs. Open DevTools → Network tab → filter by XHR/Fetch:

import requests

# The actual API endpoint behind the SPA
api_url = "https://example-spa.com/api/v2/products?page=1&limit=50"
headers = {"Accept": "application/json"}

response = requests.get(api_url, headers=headers)
data = response.json()

for product in data["results"]:
    print(f"{product['name']} — ${product['price']}")
Enter fullscreen mode Exit fullscreen mode

This is the fastest and most reliable approach. Always check for APIs first.

Strategy 2: Extract window.__INITIAL_STATE__

Many frameworks embed pre-rendered data in the HTML as a JavaScript variable:

import requests
import re
import json

response = requests.get("https://example-spa.com/products")
html = response.text

# Extract the embedded state object
match = re.search(r'window\.__INITIAL_STATE__\s*=\s*({.*?});', html, re.DOTALL)
if match:
    state = json.loads(match.group(1))
    products = state["catalog"]["products"]
    for p in products:
        print(f"{p['title']}{p['price']}")
Enter fullscreen mode Exit fullscreen mode

Look for variations like __NEXT_DATA__ (Next.js), __NUXT__ (Nuxt.js), or window.__PRELOADED_STATE__.

Strategy 3: Browser Automation with Playwright

When there's no API and no embedded state, use a headless browser:

from playwright.sync_api import sync_playwright

def scrape_dynamic_page(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")

        # Wait for specific content to load
        page.wait_for_selector(".product-card")

        products = page.query_selector_all(".product-card")
        results = []
        for product in products:
            name = product.query_selector(".name").inner_text()
            price = product.query_selector(".price").inner_text()
            results.append({"name": name, "price": price})

        browser.close()
        return results

data = scrape_dynamic_page("https://example-spa.com/products")
for item in data:
    print(item)
Enter fullscreen mode Exit fullscreen mode

Strategy 4: Use a Scraping API

For production workloads, a scraping API handles JavaScript rendering, proxy rotation, and CAPTCHAs for you:

import requests

# ScraperAPI renders JavaScript automatically
params = {
    "api_key": "YOUR_KEY",
    "url": "https://example-spa.com/products",
    "render": "true"
}

response = requests.get("https://api.scraperapi.com", params=params)
html = response.text  # Fully rendered HTML with all dynamic content
Enter fullscreen mode Exit fullscreen mode

Try ScraperAPI free — 5,000 API credits

Handling Infinite Scroll

Many SPAs use infinite scroll instead of pagination:

from playwright.sync_api import sync_playwright
import time

def scrape_infinite_scroll(url, max_scrolls=10):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")

        previous_height = 0
        for i in range(max_scrolls):
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            time.sleep(2)

            current_height = page.evaluate("document.body.scrollHeight")
            if current_height == previous_height:
                break  # No more content to load
            previous_height = current_height

        # Now extract all loaded content
        items = page.query_selector_all(".item")
        results = [el.inner_text() for el in items]
        browser.close()
        return results
Enter fullscreen mode Exit fullscreen mode

Performance Tips

  1. Always try the API first — it's 10-100x faster than browser automation
  2. Check for __INITIAL_STATE__ before launching a browser
  3. Block unnecessary resources (images, fonts, analytics) in Playwright to speed up rendering
  4. Use connection pooling with requests.Session() for API-based scraping
  5. Respect rate limits — add delays between requests

Conclusion

Dynamic websites require smarter scraping strategies, but they're far from impossible. Start with API discovery, check for embedded state data, and only fall back to browser automation when needed. For production-scale scraping, ScraperAPI handles the complexity of JavaScript rendering and proxy management so you can focus on the data.

Happy scraping!

Top comments (0)