DEV Community

Bernice Waweru
Bernice Waweru

Posted on • Updated on

Web Scraping with Python and Selenium.

Data is vital for informed decision making because it provides reliable insights. However, we need to source the data before we can use it for analysis. Web scraping is a one method that can be used to leverage the large amount of data available on the web by extracting specific information from a website using selectors.
For a better understanding of selectors you can check this resource.

Selenium

Selenium is a tool used for automating web browsers and functions on various browsers, OS and can be used in different languages.
For this project, I will use selenium and the Chrome browser.
You need to have the chrome web driver which can be downloaded here depending on your chrome version.

We are going to scrape PURPINK which is a Kenyan online gift shop.

Scraping
Import selenium for browser automation and for locating elements on the website.
Import Options to run chrome in headless mode meaning you can launch the browser without creating a browser window.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
Enter fullscreen mode Exit fullscreen mode

Specify the path of the chrome webdriver and the website URL.

options = Options()
options.add_argument("--headless")  # prevents opening browser window
options = options
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

driver.get("https://www.purpink.co.ke/collections/her")
Enter fullscreen mode Exit fullscreen mode

Inspect elements on the website to retrieve the name and price of each product and write the results to a csv file.

product_names = driver.find_elements_by_class_name("product-thumbnail__title")
product_prices = driver.find_elements_by_class_name("money")

filename = "purpink.csv"
headers = ("Brand,Price(Ksh) \n")
f = open(filename, "w")
f.write(headers)

for (product, price) in zip(product_names, product_prices):
    firstPrice = price.text.strip("KSh").split(",")
    finalPrice = "".join(firstPrice)
    f.write(product.text + "," + finalPrice + "\n")

Enter fullscreen mode Exit fullscreen mode

The whole project can be achieved by

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless")  # prevents opening browser window
options = options
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

driver.get("https://www.purpink.co.ke/collections/her")

product_names = driver.find_elements_by_class_name("product-thumbnail__title")
product_prices = driver.find_elements_by_class_name("money")

filename = "purpink.csv"
headers = ("Brand,Price(Ksh) \n")
f = open(filename, "w")
f.write(headers)

for (product, price) in zip(product_names, product_prices):
    firstPrice = price.text.strip("KSh").split(",")
    finalPrice = "".join(firstPrice)
    f.write(product.text + "," + finalPrice + "\n")
Enter fullscreen mode Exit fullscreen mode

The code and csv results can be found on my github.

Top comments (0)