Introduction
When working with data, access to well-structured and reliable information can make all the difference. I recently developed a project called Africa Energy Coverage Data Extractor, which focuses on collecting energy data for all African countries between 2000 and 2022. This project uses Python and Selenium to automatically gather data from the Africa Energy Portal and then stores it in a format that can be analyzed or uploaded to a cloud database.
The goal was simple: create a tool that could automatically go through the Africa Energy Portal, capture yearly statistics for every country, and organize everything into one clean dataset.
What the Project Does
The Africa Energy Coverage Data Extractor uses Selenium WebDriver to browse through the website just like a human would. It navigates across multiple pages and year sliders to pull together information on different energy indicators.
Once the data is collected, it is saved into a CSV file for local use. You can also choose to upload it to MongoDB Atlas, which is a cloud database that makes it easier to store and manage large datasets.
Main Features
Here are some of the key things the extractor can do:
- Collects energy data for all 54 African countries
- Extracts yearly data from 2000 to 2022
- Handles dynamic website elements using Selenium
- Gathers a wide range of energy indicators such as electricity access, generation, and consumption
- Saves data as a structured CSV file
- Optionally uploads the data to MongoDB Atlas
- Organizes code into simple, modular scripts
The Tools Behind the Project
The project was built with tools that are common in data work but powerful when combined:
- Python 3.x for programming
- Selenium WebDriver for browser automation
- Pandas and NumPy for data handling
- webdriver-manager for managing browser drivers
- MongoDB Atlas and PyMongo for data storage and access
- Google Chrome as the browser used during automation
This combination makes it easy to manage both scraping and storage without needing complicated setups.
How the Project is Organized
The code is divided into a few Python files that handle different parts of the process:
- scrape.py – contains the main scraping logic
- mongodb.py – manages the connection to MongoDB and data uploads
- main.py – runs the full process from scraping to saving
- requirements.txt – lists the Python packages you need
- README.md – contains documentation
This structure keeps everything clear and easy to maintain.
Getting Started
To run the project, you’ll need to have Python 3.8 or higher, Google Chrome, and Git installed.
After cloning the project repository, you can set up your environment by creating a virtual environment and installing the dependencies:
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
Once that’s done, you can run the main script to start scraping:
python main.py
This will collect the data and save it into a CSV file on your computer.
Connecting to MongoDB Atlas
If you’d like to upload the extracted data to MongoDB, you’ll need to create a free cluster on MongoDB Atlas.
After that, set up your environment variables with your database connection details:
MONGO_URI="your_mongodb_connection_uri"
MONGO_DATABASE="AfricaEnergyData"
Then you can upload the data by running:
python main.py
Data in MongoDB:
The data will be sent to your cloud database, where you can access it anytime.
Understanding the Data
Each record in the dataset contains information about a specific energy metric for a country and year.
Here’s what each column represents:
country – name of the country
country_serial – ISO3 country code
metric – energy metric (for example, access to electricity)
value – the numeric value of the metric
unit – the unit of measurement
sector – the main sector category
sub_sector – a more specific classification
sub_sub_sector – an even finer breakdown
source_link – link to the source on the Africa Energy Portal
source – the name of the source (default: Africa Energy Portal)
year – the year of the data (2000–2022)
This makes it easy to work with the data in analytics tools or dashboards later on.
How It Works in Practice
The process starts with Selenium opening the Africa Energy Portal. It then loops through each year from 2000 to 2022, collecting data for all available countries. Once all the information is gathered, it’s organized into a CSV file.
You can decide whether to stop there or upload the data to MongoDB Atlas for storage and sharing.
Contact
Author: Evalyn Njagi
Email: evalynnjagi02@gmail.com
LinkedIn: Evalyn Njagi
If you are interested in contributing, improving the scraper, or expanding it to new datasets, feel free to connect or open an issue on GitHub.
Final Thoughts
Working on this project taught me a lot about data automation and cloud storage. It showed how a simple Python workflow can help collect large-scale information in a structured and reusable format.
This tool can be extended for other kinds of regional or sectoral data, and it serves as a practical example of how data engineering can support research and development in different fields.
Top comments (0)