In this step-by-step tutorial, we show you how to scrape data from a website using Python with Selenium. Learn how to locate and wait for elements, take screenshots, and execute JavaScript code with examples.
What is Selenium?
Until 2003, the word selenium was known only as a chemical element, but in 2004, it became the name of one of the most popular software testing frameworks. Initially designed for cross-browser end-to-end tests, Selenium is a powerful open-source browser automation platform that supports Java, Python, C#, Ruby, JavaScript, and Kotlin.
Why is it called Selenium? The name came from a joke in an email by its creator, Jason Huggins. Wishing to mock his competitor, Mercury Interactive Corporation, Huggins quipped that you could cure mercury poisoning by taking selenium supplements. Thus the name Selenium caught on, and the rest, as they say, is history.
Why use Selenium with Python for web scraping?
Python is by far the most popular choice of programming language for scraping web data. Combining it with Selenium WebDriver provides an easy API to write functional tests and web scraping scripts.
Selenium offers several ways to interact with websites, such as clicking buttons, filling in forms, scrolling pages, taking screenshots, and executing JavaScript code. That means Selenium can be used to scrape dynamically loaded content. Add to this its cross-language and cross-browser support, and it's little wonder that Selenium is one of the preferred frameworks for web scraping in Python.
Fun fact: The Python language was not named after the snake. When Guido van Rossum was implementing the language, he wanted a name for it that would be short, unique, and somewhat mysterious. It just so happened that he was reading the published scripts of Monty Python and the Flying Circus at the time. That influenced him to go with the name Python.
How to scrape a website with Selenium in Python
With that brief introduction out of the way, its time to show you how to scrape a website using Python with Selenium.
Setting up the environment for web scraping
To follow this tutorial, youll need to have the following installed:
Python 3.6 or later
Selenium package (
pip install selenium
)The ChromeDriver that matches your Chrome browser version
You'll have to import the necessary packages for your Selenium script. For this tutorial, you'll need:
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
🚀 Launching the browser and navigating to the website
Once you have the required packages, you can launch the browser with the webdriver
module and navigate to the website you want to scrape. In this case, we'll be using Chrome as the browser and navigating to the Monty Python online store: https://www.montypythononlinestore.com/.
DRIVER_PATH = "/usr/local/bin/chromedriver" # This path works for macos
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get("https://www.montypythononlinestore.com/")
If you want to switch to headless Chrome, you need to instantiate an Options
object and set add_argument
to --headless=new
.
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)
driver.get("http://www.montypythononlinestore.com/")
Selenium is deprecating the convenience method (options.headless = True
) due to Chrome's recent upgrade to headless mode.
🔎 Locating and interacting with elements
Now that you've navigated to the website, you'll need to locate elements on the page and interact with them. For example, you might want to search for a product in the e-shop.
search_box = driver.find_element(By.ID, "search-field")
search_box.send_keys("t-shirt")
search_box.send_keys(Keys.ENTER)
With Selenium WebDriver, you can use find_element
for a single element or find_elements
for a list of them. For example, if you want to select the <h2>
element in an HTML document:
h2 = driver.find_element(By.TAG_NAME, 'h2')
If you want to select all elements with the class name 'product' on a page:
all_products = driver.find_elements(By.CLASS_NAME, 'product')
Waiting for elements to load
Sometimes, the content on the web page is dynamically loaded after the initial page load. In such cases, you can wait for the required element to load using the WebDriverWait
function.
In the example below, we wait 10 seconds for the 'h2' to load.
wait = WebDriverWait(driver, 10)
element = wait.until(EC.presence_of_element_located((By.TAG_NAME, "h2")))
Once the element is loaded, you can scrape its content using the element.text
method.
element_text = element.text
📸 Taking a screenshot
If you need to take a screenshot of the website at any point, you can do that in your script using the save_screenshot()
function.
driver.save_screenshot("screenshot.png")
🪓 Executing JavaScript code
To execute JavaScript code, use the execute_script()
method. For example, if you want to scroll to the bottom of the page to take a screenshot:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
You can use time.sleep
to wait for the browser to scroll down before taking the screenshot. In the example below, we wait 5 seconds for the browser to scroll down.
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
driver.save_screenshot("screenshot.png")
🚪 Closing the browser
When youre done, you can close the browser window with the driver.quit()
method. Note that the quit
method is different from the close
method. close()
only closes the current window, but the WebDriver session will remain active, so use quit()
to close all browser windows and end the WebDriver session.
📝 Final code
Now lets put it all together into a script for scraping the Monty Python store:
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
DRIVER_PATH = "/usr/local/bin/chromedriver"
driver = webdriver.Chrome(executable_path=DRIVER_PATH)
driver.get("<https://www.montypythononlinestore.com/>")
search_box = driver.find_element(By.ID, "search-field")
search_box.send_keys("t-shirt")
search_box.send_keys(Keys.ENTER)
wait = WebDriverWait(driver, 10)
element = wait.until(EC.presence_of_element_located((By.TAG_NAME, "h2")))
element_text = element.text
print(element_text)
all_products = driver.find_elements(By.CLASS_NAME, 'product')
print(f'There are {len(all_products)} t-shirts on the page.')
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
driver.save_screenshot("screenshot.png")
driver.quit()
Conclusion & further reading
Weve shown you how to use Selenium with Python to scrape the Monty Python online store, but you can use what you've learned here to scrape data from any site you like with the driver.get
method.
If you want to learn more about Python, Selenium, and other powerful web scraping and browser automation tools, check out the online literature below.
Selenium for web scraping, testing, and automation:
🔖 Selenium documentation
🔖 Playwright vs. Selenium: which one to use for web scraping?
🔖 Puppeteer vs. Selenium for automation
🔖 Cypress vs. Selenium: choosing the right web testing and automation framework for your project
Web scraping with Python:
🔖 The Apify SDK for Python
🔖 Web scraping with Python
🔖 Web scraping with JavaScript vs. Python
🔖 Why is Python used for web scraping?
Top comments (0)