Web Scrapping Project

#python #selenium #webscraping #mongodb

Getting data from modern websites is not the same as it used to be- today, most websites render their data dynamically making it hard for traditional web scrapping tools to obtain any data.

In this project: The Africa Energy project, we are going to use different tools to obtain data about Energy Indicators across 54 African countries for the years 2000 - 2022 from the The Africa Energy Portal.

The project features: a web scrapper that extracts JSON data from API network responses.

Project Overview

The Africa Energy Portal is a dynamic webpage that contains information about energy indicators across 54 African countries. The indicators are energy access, supply and technical aspects related to energy.

The indicators are further broken down into sub-sectors such as 'Population access to electricity-National' which shows the percentage of people with access to electricity on a national level etc.

The Process

The project uses the following technologies:

a. Python for developing the web scrapping logic
b. Selenium for browser automation and capturing network responses
c. Pandas for data handling and manipulation
d. MongoDB which is a NOSQL database for storing and querying the data

The scrapper utilizes Selenium to automate browser interaction such as loading the page and selecting all required themes, years and countries for precise data extraction.

The results

The scrapper obtains all the selected fields of the data ie;

id, name, score, unit, region name, indicator topic, indicator source, indicator name, indicator group, year, url

The data is extracted in JSON format and appended to an empty list before it is flattened and converted to csv format.

Working on this project has been nothing short of a learning experience from the thought process of understanding the project, to learning of different ways to execute the project, to implementation of the same.

You can check out the project on Github and feel free to reach out for inquiries or collaboration!

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.