Evalyn Njagi

Posted on Oct 12

Collecting Africa’s Energy Insights:

#dataengineering #data #automation #python

Introduction
When working with data, access to well-structured and reliable information can make all the difference. I recently developed a project called Africa Energy Coverage Data Extractor, which focuses on collecting energy data for all African countries between 2000 and 2022. This project uses Python and Selenium to automatically gather data from the Africa Energy Portal and then stores it in a format that can be analyzed or uploaded to a cloud database.

The goal was simple: create a tool that could automatically go through the Africa Energy Portal, capture yearly statistics for every country, and organize everything into one clean dataset.

What the Project Does
The Africa Energy Coverage Data Extractor uses Selenium WebDriver to browse through the website just like a human would. It navigates across multiple pages and year sliders to pull together information on different energy indicators.
Once the data is collected, it is saved into a CSV file for local use. You can also choose to upload it to MongoDB Atlas, which is a cloud database that makes it easier to store and manage large datasets.

Main Features

Here are some of the key things the extractor can do:

Collects energy data for all 54 African countries
Extracts yearly data from 2000 to 2022
Handles dynamic website elements using Selenium
Gathers a wide range of energy indicators such as electricity access, generation, and consumption
Saves data as a structured CSV file
Optionally uploads the data to MongoDB Atlas
Organizes code into simple, modular scripts

The Tools Behind the Project
The project was built with tools that are common in data work but powerful when combined:

Python 3.x for programming
Selenium WebDriver for browser automation
Pandas and NumPy for data handling
webdriver-manager for managing browser drivers
MongoDB Atlas and PyMongo for data storage and access
Google Chrome as the browser used during automation

This combination makes it easy to manage both scraping and storage without needing complicated setups.

How the Project is Organized

The code is divided into a few Python files that handle different parts of the process:

scrape.py – contains the main scraping logic
mongodb.py – manages the connection to MongoDB and data uploads
main.py – runs the full process from scraping to saving
requirements.txt – lists the Python packages you need
README.md – contains documentation

This structure keeps everything clear and easy to maintain.

Getting Started

To run the project, you’ll need to have Python 3.8 or higher, Google Chrome, and Git installed.

After cloning the project repository, you can set up your environment by creating a virtual environment and installing the dependencies:

python -m venv venv
source venv/bin/activate    # or venv\Scripts\activate on Windows
pip install -r requirements.txt

Once that’s done, you can run the main script to start scraping:

python main.py

This will collect the data and save it into a CSV file on your computer.

Connecting to MongoDB Atlas

If you’d like to upload the extracted data to MongoDB, you’ll need to create a free cluster on MongoDB Atlas.

After that, set up your environment variables with your database connection details:

MONGO_URI="your_mongodb_connection_uri"
MONGO_DATABASE="AfricaEnergyData"

Then you can upload the data by running:

python main.py

Data in MongoDB:

The data will be sent to your cloud database, where you can access it anytime.

Understanding the Data

Each record in the dataset contains information about a specific energy metric for a country and year.
Here’s what each column represents:

country – name of the country

country_serial – ISO3 country code

metric – energy metric (for example, access to electricity)

value – the numeric value of the metric

unit – the unit of measurement

sector – the main sector category

sub_sector – a more specific classification

sub_sub_sector – an even finer breakdown

source_link – link to the source on the Africa Energy Portal

source – the name of the source (default: Africa Energy Portal)

year – the year of the data (2000–2022)

This makes it easy to work with the data in analytics tools or dashboards later on.

How It Works in Practice

The process starts with Selenium opening the Africa Energy Portal. It then loops through each year from 2000 to 2022, collecting data for all available countries. Once all the information is gathered, it’s organized into a CSV file.

You can decide whether to stop there or upload the data to MongoDB Atlas for storage and sharing.

Contact
Author: Evalyn Njagi
Email: evalynnjagi02@gmail.com
LinkedIn: Evalyn Njagi

If you are interested in contributing, improving the scraper, or expanding it to new datasets, feel free to connect or open an issue on GitHub.

Final Thoughts

Working on this project taught me a lot about data automation and cloud storage. It showed how a simple Python workflow can help collect large-scale information in a structured and reusable format.
This tool can be extended for other kinds of regional or sectoral data, and it serves as a practical example of how data engineering can support research and development in different fields.

DEV Community

Collecting Africa’s Energy Insights:

Top comments (0)