In order to get accurate policy analysis, research and innovation across the continent, it is necessary to have access to accurate energy data. I worked on a data extraction and structuring project, where I built a clean pipeline to process multiple energy datasets.
🎯 Project Goal
To collect, clean, format and upload Africa's energy-related datasets into a centralized MongoDB database, ready for analytics dashboards, automation and future API integration.
What was achieved
a. File Structure and Raw Data Review
I gathered over 30 CSV energy files covering access rates, generation, imports, exports, renewables, consumption trends and installed capacity.
Designed a consistent file naming strategy using formatted_.csv.
b. Data Cleaning & Extraction
Standardized missing values, column casing and data types.
Ensured uniform schema across different energy indicators to enable integration later.
c. Master Dataset Creation
- Merged individual metric files into a single master dataset master_energy_dataset.csv for centralized analytics.
d. MongoDB Integration
Connected to MongoDB Atlas using URI string.
Built a production-ready Python script to:
i. Loop through all formatted CSV files.
ii. Create separate collections automatically.
iii. Insert clean data using insert_many() with dedupe logic and schema structure.
iv. Uploaded master_energy_dataset.csv as a unified collection.
e. GitHub & Version Control
Initialized Git repository.
Committed dataset extraction notebook EDA.ipynb.
🛠 Tech Stack Used
Data Cleaning - Pandas
Storage MongoDB - Atlas (NoSQL)
Upload Logic - PyMongo + Automation Script
Version Control - Git + GitHub
Future Plans - FastAPI + Dashboard Integration
This project marks the first building block of a scalable African Energy Data Platform. Clean, well-structured, accessible data is the foundation - and now that foundation exists.
Top comments (0)