Building a clean Energy Data Pipeline for Africa( from raw CSVs to MongoDB)

#dataengineering #analytics #data #mongodb

In order to get accurate policy analysis, research and innovation across the continent, it is necessary to have access to accurate energy data. I worked on a data extraction and structuring project, where I built a clean pipeline to process multiple energy datasets.

🎯 Project Goal

To collect, clean, format and upload Africa's energy-related datasets into a centralized MongoDB database, ready for analytics dashboards, automation and future API integration.

What was achieved

a. File Structure and Raw Data Review

I gathered over 30 CSV energy files covering access rates, generation, imports, exports, renewables, consumption trends and installed capacity.
Designed a consistent file naming strategy using formatted_.csv.

b. Data Cleaning & Extraction

Standardized missing values, column casing and data types.
Ensured uniform schema across different energy indicators to enable integration later.

c. Master Dataset Creation

Merged individual metric files into a single master dataset master_energy_dataset.csv for centralized analytics.

d. MongoDB Integration

Connected to MongoDB Atlas using URI string.
Built a production-ready Python script to:
i. Loop through all formatted CSV files.
ii. Create separate collections automatically.
iii. Insert clean data using insert_many() with dedupe logic and schema structure.
iv. Uploaded master_energy_dataset.csv as a unified collection.

e. GitHub & Version Control

Initialized Git repository.
Committed dataset extraction notebook EDA.ipynb.

🛠 Tech Stack Used

Data Cleaning - Pandas
Storage MongoDB - Atlas (NoSQL)
Upload Logic - PyMongo + Automation Script
Version Control - Git + GitHub
Future Plans - FastAPI + Dashboard Integration

This project marks the first building block of a scalable African Energy Data Platform. Clean, well-structured, accessible data is the foundation - and now that foundation exists.

DEV Community

Building a clean Energy Data Pipeline for Africa( from raw CSVs to MongoDB)

🎯 Project Goal

What was achieved

🛠 Tech Stack Used

Top comments (0)