Web crawling with python and selenium


For a recent job interview I was tasked with scraping a web table and converting the data into a csv file. To get started I first searched the around for the proper tools for the job. I knew I would need to install selenium with pip install -U selenium then from reading the docs it states 'Selenium requires a driver to interface with the chosen browser.'. For this I choose the chrome webdriver once downloaded Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin. if you do not do this chrome will fail to open.

Learing to crawl

To get started you must import selenium from webdriver, set a variable equal to webdriver.Chrome() and then call the variable.get('url here')

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Firefox()

assert 'Yahoo' in browser.title

elem = browser.find_element_by_name('p')  # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)

Congrats you just learned to crawl. In my next blog I will go over using pandas to turn a web table into a dataframe then csv.

Make sure to check out the selenium docs for more information!

