Scrape Data with Selenium in Python

#selenium #python #programming #scraping

This is the third and the last post in series about scraping with Selenium in Python. In this one, we’ll focus on extracting data: locating elements, reading text, handling the Shadow DOM, and exporting your results.

Step 1: Locate Elements Using the By API
Step 2: Handle Shadow DOM and Nested Elements
Step 3: Extract Text and Attributes
Step 4: Parse Tables and Lists
Step 5: Export Data to CSV or JSON

Step 1: Locate Elements Using the By API

The old find_element_by_* methods are gone in old Selenium. Now you should use the By class.

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com")

# Examples of locators
title = driver.find_element(By.TAG_NAME, "h1")
links = driver.find_elements(By.CSS_SELECTOR, "a")

print(title.text)
print(f"Found {len(links)} links")

driver.quit()

Keep your selectors structured, By.CSS_SELECTOR is usually all you need.

Step 2: Handle Shadow DOM and Nested Elements

Some modern pages hide data inside shadow roots (common in web components). Selenium doesn’t access them directly, but you can reach them with a bit of JavaScript.

shadow_host = driver.find_element(By.CSS_SELECTOR, "custom-element")
shadow_root = driver.execute_script("return arguments[0].shadowRoot", shadow_host)
inner_button = shadow_root.find_element(By.CSS_SELECTOR, "button")
inner_button.click()

You won’t need this often, but it’s good to know when a normal locator suddenly stops working.

Step 3: Extract Text and Attributes

.text gives you the visible content of an element. For hidden or internal data, use .get_attribute().

item = driver.find_element(By.CSS_SELECTOR, ".product")
name = item.text
price = item.get_attribute("data-price")

print(name, price)

Step 4: Parse Tables and Lists

You can easily scrape structured data like tables or lists using simple loops.

rows = driver.find_elements(By.CSS_SELECTOR, "table tr")

for row in rows:
    cells = [cell.text for cell in row.find_elements(By.TAG_NAME, "td")]
    print(cells)

Or, for a list of items:

items = driver.find_elements(By.CSS_SELECTOR, ".item")
data = [i.text for i in items]
print(data)

This keeps your scraper fast and easy to debug.

Step 5: Export Data to CSV or JSON

Once you’ve collected data, save it in a structured format.

import csv, json

data = [
    {"name": "Alice", "age": "101"},
    {"name": "Bob", "age": "11"}
]

# CSV
with open("output.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=data[0].keys())
    writer.writeheader()
    writer.writerows(data)

# JSON
with open("output.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2)

Simple and reusable, you can plug this into any scraping script.

Final Notes

It was the last post, but here are some useful resources:

If you want any examples I might have missed, leave a comment and I’ll add them.

Top comments (2)

OnlineProxy • Oct 29

Roll with Selenium 4’s By API-use CSS for speed and sanity, bust out XPath only when you need text predicates or ancestor magic. Grab rendered text via .text/innerText, and use get_attribute for the hidden goodies. Be patient with explicit waits, keep implicitly_wait tiny, run headless with a set window size, batch your infinite scroll, and lock in stable selectors. For SPAs, wait for real “ready” signals, poke shadowRoot, hop into iframes, and mix Selenium for auth/discovery with requests/BS4 for speed-dedupe by stable IDs and scrub text.

Valentina Skakun HasData • Oct 29

Great summary :)
Most of these points were actually covered in the main blog post I linked at the end. And I totally agree about mixing requests/BeautifulSoup when possible. The article here just focused more on Selenium itself.