DEV Community

Valentina Skakun for HasData

Posted on

Scrape Data with Selenium in Python

This is the third and the last post in series about scraping with Selenium in Python. In this one, we’ll focus on extracting data: locating elements, reading text, handling the Shadow DOM, and exporting your results.

Table of Contents

  1. Step 1: Locate Elements Using the By API
  2. Step 2: Handle Shadow DOM and Nested Elements
  3. Step 3: Extract Text and Attributes
  4. Step 4: Parse Tables and Lists
  5. Step 5: Export Data to CSV or JSON

Step 1: Locate Elements Using the By API

The old find_element_by_* methods are gone in old Selenium. Now you should use the By class.

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com")

# Examples of locators
title = driver.find_element(By.TAG_NAME, "h1")
links = driver.find_elements(By.CSS_SELECTOR, "a")

print(title.text)
print(f"Found {len(links)} links")

driver.quit()
Enter fullscreen mode Exit fullscreen mode

Keep your selectors structured, By.CSS_SELECTOR is usually all you need.

Step 2: Handle Shadow DOM and Nested Elements

Some modern pages hide data inside shadow roots (common in web components). Selenium doesn’t access them directly, but you can reach them with a bit of JavaScript.

shadow_host = driver.find_element(By.CSS_SELECTOR, "custom-element")
shadow_root = driver.execute_script("return arguments[0].shadowRoot", shadow_host)
inner_button = shadow_root.find_element(By.CSS_SELECTOR, "button")
inner_button.click()
Enter fullscreen mode Exit fullscreen mode

You won’t need this often, but it’s good to know when a normal locator suddenly stops working.

Step 3: Extract Text and Attributes

.text gives you the visible content of an element. For hidden or internal data, use .get_attribute().

item = driver.find_element(By.CSS_SELECTOR, ".product")
name = item.text
price = item.get_attribute("data-price")

print(name, price)
Enter fullscreen mode Exit fullscreen mode

Step 4: Parse Tables and Lists

You can easily scrape structured data like tables or lists using simple loops.

rows = driver.find_elements(By.CSS_SELECTOR, "table tr")

for row in rows:
    cells = [cell.text for cell in row.find_elements(By.TAG_NAME, "td")]
    print(cells)
Enter fullscreen mode Exit fullscreen mode

Or, for a list of items:

items = driver.find_elements(By.CSS_SELECTOR, ".item")
data = [i.text for i in items]
print(data)
Enter fullscreen mode Exit fullscreen mode

This keeps your scraper fast and easy to debug.

Step 5: Export Data to CSV or JSON

Once you’ve collected data, save it in a structured format.

import csv, json

data = [
    {"name": "Alice", "age": "101"},
    {"name": "Bob", "age": "11"}
]

# CSV
with open("output.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=data[0].keys())
    writer.writeheader()
    writer.writerows(data)

# JSON
with open("output.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2)
Enter fullscreen mode Exit fullscreen mode

Simple and reusable, you can plug this into any scraping script.

Final Notes

It was the last post, but here are some useful resources:

If you want any examples I might have missed, leave a comment and I’ll add them.

Top comments (0)