In order to get accurate policy analysis, research and innovation across the continent, it is necessary to have access to accurate energy data. I worked on a data extraction and structuring project, where I built a clean pipeline to process multiple energy datasets.
🎯 Project Goal
To collect, clean, format and upload Africa's energy-related datasets into a centralized MongoDB database, ready for analytics dashboards, automation and future API integration.
What was achieved
a. File Structure and Raw Data Review
- I gathered over 30 CSV energy files covering access rates, generation, imports, exports, renewables, consumption trends and installed capacity. 
- Designed a consistent file naming strategy using formatted_.csv. 
b. Data Cleaning & Extraction
- Standardized missing values, column casing and data types. 
- Ensured uniform schema across different energy indicators to enable integration later. 
c. Master Dataset Creation
- Merged individual metric files into a single master dataset master_energy_dataset.csv for centralized analytics.
d. MongoDB Integration
- Connected to MongoDB Atlas using URI string. 
- Built a production-ready Python script to: 
 i. Loop through all formatted CSV files.
 ii. Create separate collections automatically.
 iii. Insert clean data using insert_many() with dedupe logic and schema structure.
 iv. Uploaded master_energy_dataset.csv as a unified collection.
e. GitHub & Version Control
- Initialized Git repository. 
- Committed dataset extraction notebook EDA.ipynb. 
🛠 Tech Stack Used
Data Cleaning - Pandas
Storage MongoDB - Atlas (NoSQL)
Upload Logic - PyMongo + Automation Script
Version Control - Git + GitHub
Future Plans - FastAPI + Dashboard Integration
This project marks the first building block of a scalable African Energy Data Platform. Clean, well-structured, accessible data is the foundation - and now that foundation exists.
 

 
    
Top comments (0)