DEV Community

Prince
Prince

Posted on

Web Scraping with Python, getting the table of countries and country codes from countrycode.org

Ever needed some data that is sitting on a webpage you cannot easily copy and paste from?
I'm going to show you how to get data from webpages (webscraping, essentially) with Python. Specifically we'll be getting the countries, iso codes and phone number codes in the table found on this page

Before we get into the code proper, we need to install some Python packages first (requests and beautifulsoup4)
pip install requests beautifulsoup4

Follow the following steps.

Import required packages

from bs4 import BeautifulSoup
import json
import requests
Enter fullscreen mode Exit fullscreen mode

Get the webpages content using the requests package

url = "https://countrycode.org/"
r = requests.get(url)
r.raise_for_status()
Enter fullscreen mode Exit fullscreen mode

The last line raises an exception if the request's response code is not a successful one, thus stopping the program.

Create the 'soup' and select all the rows found in the table's body. This 'soup' object helps us to get particular elements from the HTML page's content.

soup = BeautifulSoup(r.content, 'html.parser')
rows = soup.select("tbody>row") # select all the rows that are direct descendants of a tbody element
Enter fullscreen mode Exit fullscreen mode

Get the countries from the table

list_of_countries = []
for row in rows:
    keys = ["name", "country_code", "iso_codes", "population", "area/km2", "gdp $USD"] # the different columns in the table
    country_object = {}
    for key in keys:
        country_object[key] = '' # creating a dictionary for the row

    for index, cell in enumerate(row.find_all('td')): # looping through the different td elements found in this row
        if index < len(keys):
            if index ==  0:
                # get the text found in the hyperlink in the cell
                country_object[keys[index]] = cell.find('a').text
            else:
                # get the text found in the cell
                country_object[keys[index]] = cell.text
    list_of_countries.append(country_object)
Enter fullscreen mode Exit fullscreen mode

Save the list to a json file

with open("countries.json", "w") as _: # replace countries.json with whatever you want
    json.dump(list_of_countries, _)
Enter fullscreen mode Exit fullscreen mode

VOILA! You have successfully gotten the list of countries, their ISO and area codes, surface areas and gdp.

Top comments (0)