Dynamic websites built with React, Vue, Angular, and other JavaScript frameworks present unique challenges for web scraping. Unlike traditional server-rendered pages, these Single Page Applications (SPAs) load content asynchronously, making simple HTTP requests insufficient.
In this guide, I'll show you practical techniques to scrape dynamic websites effectively in 2026.
Why Dynamic Sites Are Hard to Scrape
When you fetch a React or Vue page with requests, you get a nearly empty HTML shell:
import requests
response = requests.get("https://example-spa.com/products")
print(response.text)
# <div id="root"></div> — no actual content!
The content loads via JavaScript after the initial page load. You need different strategies.
Strategy 1: Find the Hidden API
Most SPAs fetch data from REST or GraphQL APIs. Open DevTools → Network tab → filter by XHR/Fetch:
import requests
# The actual API endpoint behind the SPA
api_url = "https://example-spa.com/api/v2/products?page=1&limit=50"
headers = {"Accept": "application/json"}
response = requests.get(api_url, headers=headers)
data = response.json()
for product in data["results"]:
print(f"{product['name']} — ${product['price']}")
This is the fastest and most reliable approach. Always check for APIs first.
Strategy 2: Extract window.__INITIAL_STATE__
Many frameworks embed pre-rendered data in the HTML as a JavaScript variable:
import requests
import re
import json
response = requests.get("https://example-spa.com/products")
html = response.text
# Extract the embedded state object
match = re.search(r'window\.__INITIAL_STATE__\s*=\s*({.*?});', html, re.DOTALL)
if match:
state = json.loads(match.group(1))
products = state["catalog"]["products"]
for p in products:
print(f"{p['title']} — {p['price']}")
Look for variations like __NEXT_DATA__ (Next.js), __NUXT__ (Nuxt.js), or window.__PRELOADED_STATE__.
Strategy 3: Browser Automation with Playwright
When there's no API and no embedded state, use a headless browser:
from playwright.sync_api import sync_playwright
def scrape_dynamic_page(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url, wait_until="networkidle")
# Wait for specific content to load
page.wait_for_selector(".product-card")
products = page.query_selector_all(".product-card")
results = []
for product in products:
name = product.query_selector(".name").inner_text()
price = product.query_selector(".price").inner_text()
results.append({"name": name, "price": price})
browser.close()
return results
data = scrape_dynamic_page("https://example-spa.com/products")
for item in data:
print(item)
Strategy 4: Use a Scraping API
For production workloads, a scraping API handles JavaScript rendering, proxy rotation, and CAPTCHAs for you:
import requests
# ScraperAPI renders JavaScript automatically
params = {
"api_key": "YOUR_KEY",
"url": "https://example-spa.com/products",
"render": "true"
}
response = requests.get("https://api.scraperapi.com", params=params)
html = response.text # Fully rendered HTML with all dynamic content
Try ScraperAPI free — 5,000 API credits
Handling Infinite Scroll
Many SPAs use infinite scroll instead of pagination:
from playwright.sync_api import sync_playwright
import time
def scrape_infinite_scroll(url, max_scrolls=10):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url, wait_until="networkidle")
previous_height = 0
for i in range(max_scrolls):
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(2)
current_height = page.evaluate("document.body.scrollHeight")
if current_height == previous_height:
break # No more content to load
previous_height = current_height
# Now extract all loaded content
items = page.query_selector_all(".item")
results = [el.inner_text() for el in items]
browser.close()
return results
Performance Tips
- Always try the API first — it's 10-100x faster than browser automation
-
Check for
__INITIAL_STATE__before launching a browser - Block unnecessary resources (images, fonts, analytics) in Playwright to speed up rendering
-
Use connection pooling with
requests.Session()for API-based scraping - Respect rate limits — add delays between requests
Conclusion
Dynamic websites require smarter scraping strategies, but they're far from impossible. Start with API discovery, check for embedded state data, and only fall back to browser automation when needed. For production-scale scraping, ScraperAPI handles the complexity of JavaScript rendering and proxy management so you can focus on the data.
Happy scraping!
Top comments (0)