For a recent job interview I was tasked with scraping a web table and converting the data into a csv file. To get started I first searched the around for the proper tools for the job. I knew I would need to install selenium with pip install -U selenium then from reading the docs it states 'Selenium requires a driver to interface with the chosen browser.'. For this I choose the chrome webdriver once downloaded Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin. if you do not do this chrome will fail to open.
To get started you must import selenium from webdriver, set a variable equal to webdriver.Chrome() and then call the variable.get('url here')
from selenium import webdriver from selenium.webdriver.common.keys import Keys browser = webdriver.Firefox() browser.get('http://www.yahoo.com') assert 'Yahoo' in browser.title elem = browser.find_element_by_name('p') # Find the search box elem.send_keys('seleniumhq' + Keys.RETURN) browser.quit()
Congrats you just learned to crawl. In my next blog I will go over using pandas to turn a web table into a dataframe then csv.
Make sure to check out the selenium docs for more information!