This is the third and the last post in series about scraping with Selenium in Python. In this one, we’ll focus on extracting data: locating elements, reading text, handling the Shadow DOM, and exporting your results.
Table of Contents
- Step 1: Locate Elements Using the By API
- Step 2: Handle Shadow DOM and Nested Elements
- Step 3: Extract Text and Attributes
- Step 4: Parse Tables and Lists
- Step 5: Export Data to CSV or JSON
Step 1: Locate Elements Using the By API
The old find_element_by_* methods are gone in old Selenium. Now you should use the By class.
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://example.com")
# Examples of locators
title = driver.find_element(By.TAG_NAME, "h1")
links = driver.find_elements(By.CSS_SELECTOR, "a")
print(title.text)
print(f"Found {len(links)} links")
driver.quit()
Keep your selectors structured, By.CSS_SELECTOR is usually all you need.
Step 2: Handle Shadow DOM and Nested Elements
Some modern pages hide data inside shadow roots (common in web components). Selenium doesn’t access them directly, but you can reach them with a bit of JavaScript.
shadow_host = driver.find_element(By.CSS_SELECTOR, "custom-element")
shadow_root = driver.execute_script("return arguments[0].shadowRoot", shadow_host)
inner_button = shadow_root.find_element(By.CSS_SELECTOR, "button")
inner_button.click()
You won’t need this often, but it’s good to know when a normal locator suddenly stops working.
Step 3: Extract Text and Attributes
.text gives you the visible content of an element. For hidden or internal data, use .get_attribute().
item = driver.find_element(By.CSS_SELECTOR, ".product")
name = item.text
price = item.get_attribute("data-price")
print(name, price)
Step 4: Parse Tables and Lists
You can easily scrape structured data like tables or lists using simple loops.
rows = driver.find_elements(By.CSS_SELECTOR, "table tr")
for row in rows:
cells = [cell.text for cell in row.find_elements(By.TAG_NAME, "td")]
print(cells)
Or, for a list of items:
items = driver.find_elements(By.CSS_SELECTOR, ".item")
data = [i.text for i in items]
print(data)
This keeps your scraper fast and easy to debug.
Step 5: Export Data to CSV or JSON
Once you’ve collected data, save it in a structured format.
import csv, json
data = [
{"name": "Alice", "age": "101"},
{"name": "Bob", "age": "11"}
]
# CSV
with open("output.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
# JSON
with open("output.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
Simple and reusable, you can plug this into any scraping script.
Final Notes
It was the last post, but here are some useful resources:
- The Complete Guide to Web Scraping with Selenium in Python
- Join our Discord
- Selenium Scraping Examples in Python and NodeJS (GitHub)
If you want any examples I might have missed, leave a comment and I’ll add them.
Top comments (0)