Intro
For a recent job interview I was tasked with scraping a web table and converting the data into a csv file. To get started I first searched the around for the proper tools for the job. I knew I would need to install selenium with pip install -U selenium then from reading the docs it states 'Selenium requires a driver to interface with the chosen browser.'. For this I choose the chrome webdriver once downloaded Make sure itβs in your PATH, e. g., place it in /usr/bin or /usr/local/bin. if you do not do this chrome will fail to open.
Learing to crawl
To get started you must import selenium from webdriver, set a variable equal to webdriver.Chrome() and then call the variable.get('url here')
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title
elem = browser.find_element_by_name('p') # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)
browser.quit()
Congrats you just learned to crawl. In my next blog I will go over using pandas to turn a web table into a dataframe then csv.
Make sure to check out the selenium docs for more information!
Top comments (0)