Annabelle

Posted on Mar 10

How to Scrape Dynamic Websites with Selenium

#automation #javascript #tutorial #webscraping

If you've ever tried collecting data from a modern website and ended up with empty HTML containers instead of real content, you're not alone.

Many developers run into this issue when working with websites built using frameworks like React, Vue, or Angular. Instead of delivering fully rendered HTML, these sites load content dynamically using JavaScript after the page loads.

So when you use a basic HTTP request to fetch the page, the data you're looking for often isn't there yet.

This is where Selenium becomes extremely useful.

Selenium allows you to automate a real browser session. That means the page loads exactly as it would for a human visitor, JavaScript included. Once everything renders, you can access the fully populated page and extract the information you need.

Let’s walk through how this works.

Why Traditional Scraping Fails on Dynamic Websites

When you fetch a page using a library like requests in Python, you receive the initial HTML response from the server.

However, many modern websites work differently:

The server sends minimal HTML.
JavaScript runs in the browser.
JavaScript requests data from APIs.
The page dynamically inserts the content.

Your script only sees step one.

This is why you might open a page in your browser and see dozens of products or listings, but your script only finds empty <div> elements.

Selenium solves this problem by actually running the browser and executing the JavaScript before extracting data.

Installing Selenium

First, install Selenium using pip:

pip install selenium

Next, download the appropriate browser driver.

Common options include:

ChromeDriver for Google Chrome
GeckoDriver for Firefox
EdgeDriver for Microsoft Edge

Make sure the driver version matches your installed browser version.

Basic Selenium Example

Here’s a minimal Selenium script using Python:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://example.com")

print(driver.title)

driver.quit()

This script:

Launches a Chrome browser
Opens a webpage
Prints the page title
Closes the browser session

By the time Selenium retrieves the page content, the browser has already executed any JavaScript needed to render the page.

Extracting Elements from the Page

Once the page loads, you can locate elements using Selenium selectors.

Example:

from selenium.webdriver.common.by import By

products = driver.find_elements(By.CSS_SELECTOR, ".product-card")

for product in products:
    print(product.text)

Selenium supports several ways to locate elements:

By.CSS_SELECTOR

By.XPATH

By.ID

By.CLASS_NAME

By.TAG_NAME

Most developers prefer CSS selectors because they are easier to maintain and usually more readable.

Waiting for Dynamic Content

Dynamic pages often load content asynchronously, so the elements you're looking for might not appear immediately.

Instead of using fixed delays with time.sleep(), Selenium provides explicit waits.

Example:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

items = WebDriverWait(driver, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "product-card"))
)

This tells Selenium to wait until the elements appear before continuing.

Explicit waits make automation scripts significantly more reliable.

Handling Infinite Scroll Pages

Many websites load additional content when the user scrolls down the page.

You can simulate this behavior with Selenium by executing JavaScript.

Example:

driver.execute_script(
    "window.scrollTo(0, document.body.scrollHeight);"
)

If you're collecting multiple batches of content, you can repeat this action in a loop:

import time

for _ in range(5):
    driver.execute_script(
        "window.scrollTo(0, document.body.scrollHeight);"
    )
    time.sleep(2)

Each scroll triggers the website to load more entries.

Running Selenium in Headless Mode

When running automation on servers or cloud environments, you typically don't want a visible browser window.

Selenium supports headless mode, which runs the browser without a graphical interface.

Example:

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome(options=options)

Headless mode reduces resource usage and makes automation easier to deploy in backend systems.

Avoiding IP Blocks When Scaling

When collecting large amounts of data, repeatedly accessing a website from the same IP address can trigger rate limits or temporary blocks.

To avoid this, many developers add proxy infrastructure to their automation stack. Developers often integrate providers of high-quality residential proxies like Squid Proxies when running workflows that require stable IP rotation and consistent connections.

Using proxies alongside Selenium can significantly improve reliability when running larger automation tasks.

When Selenium Is the Right Tool

Selenium works best when:
Pages rely heavily on JavaScript
Content loads after user interactions
Infinite scrolling is used
Data appears only after the page renders

For static websites, lightweight HTTP libraries are usually faster. But for modern dynamic applications, Selenium is often the simplest and most reliable solution.

Final Thoughts

Dynamic websites are now the standard across much of the web. Because so many platforms rely on JavaScript to render content, traditional request-based methods often fail to retrieve the data you need.

Selenium solves this problem by automating a real browser environment, allowing developers to render JavaScript-heavy pages and interact with them just like a user would.

When combined with proxy infrastructure and thoughtful automation design, Selenium becomes a powerful tool for building reliable data collection pipelines and automation workflows.

DEV Community

How to Scrape Dynamic Websites with Selenium

Why Traditional Scraping Fails on Dynamic Websites

Installing Selenium

Basic Selenium Example

Extracting Elements from the Page

Waiting for Dynamic Content

Handling Infinite Scroll Pages

Running Selenium in Headless Mode

Avoiding IP Blocks When Scaling

When Selenium Is the Right Tool

Final Thoughts

Top comments (0)