DEV Community

Cover image for How to Scrape Dynamic Websites with Selenium
Annabelle
Annabelle

Posted on

How to Scrape Dynamic Websites with Selenium

If you've ever tried collecting data from a modern website and ended up with empty HTML containers instead of real content, you're not alone.

Many developers run into this issue when working with websites built using frameworks like React, Vue, or Angular. Instead of delivering fully rendered HTML, these sites load content dynamically using JavaScript after the page loads.

So when you use a basic HTTP request to fetch the page, the data you're looking for often isn't there yet.

This is where Selenium becomes extremely useful.

Selenium allows you to automate a real browser session. That means the page loads exactly as it would for a human visitor, JavaScript included. Once everything renders, you can access the fully populated page and extract the information you need.

Let’s walk through how this works.

Why Traditional Scraping Fails on Dynamic Websites

When you fetch a page using a library like requests in Python, you receive the initial HTML response from the server.

However, many modern websites work differently:

  1. The server sends minimal HTML.
  2. JavaScript runs in the browser.
  3. JavaScript requests data from APIs.
  4. The page dynamically inserts the content.

Your script only sees step one.

This is why you might open a page in your browser and see dozens of products or listings, but your script only finds empty <div> elements.

Selenium solves this problem by actually running the browser and executing the JavaScript before extracting data.

Installing Selenium

First, install Selenium using pip:

pip install selenium
Enter fullscreen mode Exit fullscreen mode

Next, download the appropriate browser driver.

Common options include:

  • ChromeDriver for Google Chrome
  • GeckoDriver for Firefox
  • EdgeDriver for Microsoft Edge

Make sure the driver version matches your installed browser version.

Basic Selenium Example

Here’s a minimal Selenium script using Python:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://example.com")

print(driver.title)

driver.quit()
Enter fullscreen mode Exit fullscreen mode

This script:

  1. Launches a Chrome browser
  2. Opens a webpage
  3. Prints the page title
  4. Closes the browser session

By the time Selenium retrieves the page content, the browser has already executed any JavaScript needed to render the page.

Extracting Elements from the Page

Once the page loads, you can locate elements using Selenium selectors.

Example:

from selenium.webdriver.common.by import By

products = driver.find_elements(By.CSS_SELECTOR, ".product-card")

for product in products:
    print(product.text)
Enter fullscreen mode Exit fullscreen mode

Selenium supports several ways to locate elements:

By.CSS_SELECTOR

By.XPATH

By.ID

By.CLASS_NAME

By.TAG_NAME

Most developers prefer CSS selectors because they are easier to maintain and usually more readable.

Waiting for Dynamic Content

Dynamic pages often load content asynchronously, so the elements you're looking for might not appear immediately.

Instead of using fixed delays with time.sleep(), Selenium provides explicit waits.

Example:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

items = WebDriverWait(driver, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "product-card"))
)
Enter fullscreen mode Exit fullscreen mode

This tells Selenium to wait until the elements appear before continuing.

Explicit waits make automation scripts significantly more reliable.

Handling Infinite Scroll Pages

Many websites load additional content when the user scrolls down the page.

You can simulate this behavior with Selenium by executing JavaScript.

Example:

driver.execute_script(
    "window.scrollTo(0, document.body.scrollHeight);"
)
Enter fullscreen mode Exit fullscreen mode

If you're collecting multiple batches of content, you can repeat this action in a loop:

import time

for _ in range(5):
    driver.execute_script(
        "window.scrollTo(0, document.body.scrollHeight);"
    )
    time.sleep(2)
Enter fullscreen mode Exit fullscreen mode

Each scroll triggers the website to load more entries.

Running Selenium in Headless Mode

When running automation on servers or cloud environments, you typically don't want a visible browser window.

Selenium supports headless mode, which runs the browser without a graphical interface.

Example:

from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")

driver = webdriver.Chrome(options=options)
Enter fullscreen mode Exit fullscreen mode

Headless mode reduces resource usage and makes automation easier to deploy in backend systems.

Avoiding IP Blocks When Scaling

When collecting large amounts of data, repeatedly accessing a website from the same IP address can trigger rate limits or temporary blocks.

To avoid this, many developers add proxy infrastructure to their automation stack. Developers often integrate providers of high-quality residential proxies like Squid Proxies when running workflows that require stable IP rotation and consistent connections.

Using proxies alongside Selenium can significantly improve reliability when running larger automation tasks.

When Selenium Is the Right Tool

  • Selenium works best when:
  • Pages rely heavily on JavaScript
  • Content loads after user interactions
  • Infinite scrolling is used
  • Data appears only after the page renders

For static websites, lightweight HTTP libraries are usually faster. But for modern dynamic applications, Selenium is often the simplest and most reliable solution.

Final Thoughts

Dynamic websites are now the standard across much of the web. Because so many platforms rely on JavaScript to render content, traditional request-based methods often fail to retrieve the data you need.

Selenium solves this problem by automating a real browser environment, allowing developers to render JavaScript-heavy pages and interact with them just like a user would.

When combined with proxy infrastructure and thoughtful automation design, Selenium becomes a powerful tool for building reliable data collection pipelines and automation workflows.

Top comments (0)