DEV Community

anangwemike
anangwemike

Posted on

Building a clean Energy Data Pipeline for Africa( from raw CSVs to MongoDB)

In order to get accurate policy analysis, research and innovation across the continent, it is necessary to have access to accurate energy data. I worked on a data extraction and structuring project, where I built a clean pipeline to process multiple energy datasets.

🎯 Project Goal

To collect, clean, format and upload Africa's energy-related datasets into a centralized MongoDB database, ready for analytics dashboards, automation and future API integration.

What was achieved

a. File Structure and Raw Data Review

  • I gathered over 30 CSV energy files covering access rates, generation, imports, exports, renewables, consumption trends and installed capacity.

  • Designed a consistent file naming strategy using formatted_.csv.

b. Data Cleaning & Extraction

  • Standardized missing values, column casing and data types.

  • Ensured uniform schema across different energy indicators to enable integration later.

c. Master Dataset Creation

  • Merged individual metric files into a single master dataset master_energy_dataset.csv for centralized analytics.

d. MongoDB Integration

  • Connected to MongoDB Atlas using URI string.

  • Built a production-ready Python script to:
    i. Loop through all formatted CSV files.
    ii. Create separate collections automatically.
    iii. Insert clean data using insert_many() with dedupe logic and schema structure.
    iv. Uploaded master_energy_dataset.csv as a unified collection.

e. GitHub & Version Control

  • Initialized Git repository.

  • Committed dataset extraction notebook EDA.ipynb.

🛠 Tech Stack Used

Data Cleaning - Pandas
Storage MongoDB - Atlas (NoSQL)
Upload Logic - PyMongo + Automation Script
Version Control - Git + GitHub
Future Plans - FastAPI + Dashboard Integration

This project marks the first building block of a scalable African Energy Data Platform. Clean, well-structured, accessible data is the foundation - and now that foundation exists.

Top comments (0)