Web scraping with python
For my first web scraping, I followed a tutorial in YouTube by a person named Tinkernut
importing libraries
from bs4 import BeautifulSoup
import requests
import csv
Here, we import the basic libraries for scraping the data and writing them into a csv file.
url_to_scrape = requests.get('https://quotes.toscrape.com/')
soup = BeautifulSoup(url_to_scrape.text, 'html.parser')
quotes = soup.findAll("span", attrs={"class":"text"})
authors = soup.findAll("small", attrs={"class":"author"})
Here, we specify the url where we will be scrapping the data from and also the classses where the data we want is located. We also specificy the data that we need. I.e, we want to scrap quotes and authors which are in their respective span and class attributes.
file = open("quotes.csv", "w")
writer= csv.writer(file)
The file is opened in a write mode and csv.writer returns a writer object for writing files to the csv file.
writer.writerow(["Quotes", "Author"])
for quote, author in zip(quotes, authors):
print(quote.text + "." + author.text)
writer.writerow([quote.text, author.text])
file.close()
This writes the headers "Quotes" and "Author" to the CSV file. It then iterates through pairs of quote and author elements and prints each quote and author to the console. It finally writes each author and code to a new row in the CSV file before closing the csv file.
Top comments (2)
Hey! Welcome to the dev community and this is an awesome post! 🔥
Thank you for the kind words and welcome