Extracting data from a website using BeautifulSoup

#python #beautifulsoup #webscraping

There are mainly two ways to extract data from a website:

Use APIs(if available) to retrieve data.
Access the HTML of the webpage and extract useful information/data from it.

In this article, we will extract Billboard magazine's Top Hot 100 songs of the year 1970 from Billboard Year-End Hot 100 singles of 1970.

Task:

Perform Web scraping and extract all 100 songs with their artists.
Create python dictionary which contains key as title of the single and value as lists of artists.

Installation
We need to install requests and bs4.The requests module allows you to send HTTP requests using Python. Beautiful Soup (bs4) is a Python library for pulling data out of HTML and XML files.

pip install requests
pip install bs4

Import the libraries

import requests
from bs4 import BeautifulSoup

Sending request

url = "https://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_1970"
response = requests.get(url)
print(response.url) # print url
response # response status

songSoup = BeautifulSoup(response.text) # Object of BeautifulSoup

data_dictionary = {}

for song in songSoup.findAll('tr')[1:101]: # loop over index 1 to 101 because the findAll('tr') contains table headers
  # Priting 100 table rows.............
  # print(song)   

  title = song.findAll('a')[0].string

  artist = song.findAll('a')[1].string
  # Printing Titles and Artists.............
  print(title, ',', artist)

  # Printing Dictionary.............
  data_dictionary[title] = [artist]
print(data_dictionary)

DEV Community

Extracting data from a website using BeautifulSoup

Top comments (0)