Web crawling with python and selenium

#python

Intro

For a recent job interview I was tasked with scraping a web table and converting the data into a csv file. To get started I first searched the around for the proper tools for the job. I knew I would need to install selenium with pip install -U selenium then from reading the docs it states 'Selenium requires a driver to interface with the chosen browser.'. For this I choose the chrome webdriver once downloaded Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin. if you do not do this chrome will fail to open.

Learing to crawl

To get started you must import selenium from webdriver, set a variable equal to webdriver.Chrome() and then call the variable.get('url here')

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Firefox()

browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title

elem = browser.find_element_by_name('p')  # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)

browser.quit()

Congrats you just learned to crawl. In my next blog I will go over using pandas to turn a web table into a dataframe then csv.

Make sure to check out the selenium docs for more information!

Top comments (1)

Crawlbase • Apr 1 '24

Thanks! Nice blog post! Love how you break down the basics of web crawling with Python and Selenium. It's awesome to see practical tips like setting up Selenium and navigating through a webpage.
By the way,you can check out Crawlbase, it could be your next go-to tool.