DEV Community

Valentina Skakun for HasData

Posted on

Scrape Data with Selenium in Python

This is the third and the last post in series about scraping with Selenium in Python. In this one, we’ll focus on extracting data: locating elements, reading text, handling the Shadow DOM, and exporting your results.

Table of Contents

  1. Step 1: Locate Elements Using the By API
  2. Step 2: Handle Shadow DOM and Nested Elements
  3. Step 3: Extract Text and Attributes
  4. Step 4: Parse Tables and Lists
  5. Step 5: Export Data to CSV or JSON

Step 1: Locate Elements Using the By API

The old find_element_by_* methods are gone in old Selenium. Now you should use the By class.

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com")

# Examples of locators
title = driver.find_element(By.TAG_NAME, "h1")
links = driver.find_elements(By.CSS_SELECTOR, "a")

print(title.text)
print(f"Found {len(links)} links")

driver.quit()
Enter fullscreen mode Exit fullscreen mode

Keep your selectors structured, By.CSS_SELECTOR is usually all you need.

Step 2: Handle Shadow DOM and Nested Elements

Some modern pages hide data inside shadow roots (common in web components). Selenium doesn’t access them directly, but you can reach them with a bit of JavaScript.

shadow_host = driver.find_element(By.CSS_SELECTOR, "custom-element")
shadow_root = driver.execute_script("return arguments[0].shadowRoot", shadow_host)
inner_button = shadow_root.find_element(By.CSS_SELECTOR, "button")
inner_button.click()
Enter fullscreen mode Exit fullscreen mode

You won’t need this often, but it’s good to know when a normal locator suddenly stops working.

Step 3: Extract Text and Attributes

.text gives you the visible content of an element. For hidden or internal data, use .get_attribute().

item = driver.find_element(By.CSS_SELECTOR, ".product")
name = item.text
price = item.get_attribute("data-price")

print(name, price)
Enter fullscreen mode Exit fullscreen mode

Step 4: Parse Tables and Lists

You can easily scrape structured data like tables or lists using simple loops.

rows = driver.find_elements(By.CSS_SELECTOR, "table tr")

for row in rows:
    cells = [cell.text for cell in row.find_elements(By.TAG_NAME, "td")]
    print(cells)
Enter fullscreen mode Exit fullscreen mode

Or, for a list of items:

items = driver.find_elements(By.CSS_SELECTOR, ".item")
data = [i.text for i in items]
print(data)
Enter fullscreen mode Exit fullscreen mode

This keeps your scraper fast and easy to debug.

Step 5: Export Data to CSV or JSON

Once you’ve collected data, save it in a structured format.

import csv, json

data = [
    {"name": "Alice", "age": "101"},
    {"name": "Bob", "age": "11"}
]

# CSV
with open("output.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=data[0].keys())
    writer.writeheader()
    writer.writerows(data)

# JSON
with open("output.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2)
Enter fullscreen mode Exit fullscreen mode

Simple and reusable, you can plug this into any scraping script.

Final Notes

It was the last post, but here are some useful resources:

If you want any examples I might have missed, leave a comment and I’ll add them.

Top comments (2)

Collapse
 
onlineproxy profile image
OnlineProxy

Roll with Selenium 4’s By API-use CSS for speed and sanity, bust out XPath only when you need text predicates or ancestor magic. Grab rendered text via .text/innerText, and use get_attribute for the hidden goodies. Be patient with explicit waits, keep implicitly_wait tiny, run headless with a set window size, batch your infinite scroll, and lock in stable selectors. For SPAs, wait for real “ready” signals, poke shadowRoot, hop into iframes, and mix Selenium for auth/discovery with requests/BS4 for speed-dedupe by stable IDs and scrub text.

Collapse
 
valentina_skakun profile image
Valentina Skakun HasData

Great summary :)
Most of these points were actually covered in the main blog post I linked at the end. And I totally agree about mixing requests/BeautifulSoup when possible. The article here just focused more on Selenium itself.