DEV Community

Cover image for No ArcGIS, No QGIS—Just Python and a Problem Worth Solving
Rahmah Abubakar
Rahmah Abubakar

Posted on

No ArcGIS, No QGIS—Just Python and a Problem Worth Solving

This was 2024 project I did during my internship with HNG TECH focused on data cleansing and anomaly detection in electoral datasets. I recently revisited it to document the methodology and insights more clearly, as part of building my professional portfolio and showcasing real-world analytical impact.

Section 1: Why This Matters**

Electoral credibility is foundational to democracy. In this project, I used Python to analyze polling unit-level data from Zamfara State, identifying statistical anomalies that could signal irregularities, data entry errors, or procedural lapses. This is a lightweight, reproducible approach to election auditing—no GIS tools required.

Section 2: Methodology Overview**

📥 Data Source

  • CSV file with polling unit-level results
  • Fields: Latitude, Longitude, Registered Voters, Accredited Voters, Votes per Party, PU Metadata.
  • Excel was used to manually clean and organize the geospatial coordinates (longitude and latitude)

Outlier Detection

  • Calculated outlier scores using z-scores and domain heuristics
  • Ranked polling units by anomaly severity per party

Section 2: Key Findings

  • PU 108 & PU 104 (Birnin Magaji): High outlier scores for APC and PDP
  • PU 321 & PU 124 (Tsafe): Negative transcription counts flagged
  • Geospatial Clustering: Anomalies concentrated in specific LGAs
  • Cross-Party Deviations: Irregularities not isolated to one party

Section 4: Technical Highlights

  • No GIS tools—just Python (Pandas, Matplotlib, NumPy)
  • Google Colab for cloud execution
  • Scalable framework for other states or elections

Section 5: Recommendations

  • Audit flagged polling units
  • Fix metadata gaps and transcription errors
  • Integrate anomaly detection into official result verification

Closing Thoughts

This project taught me how to apply statistical thinking to real-world problems, especially in politically sensitive domains. It’s a reminder that data science isn’t

For full analysis—including code, tables, and transcription breakdowns—see the complete notebook on my LinkedIn page(https://www.linkedin.com/posts/rahmah-abubakar-243058288_analysis-process-activity-7253521166152163328-zh2n?utm_source=share&utm_medium=member_android&rcm=ACoAAEXGTFMBU8WaWDL8Z4Wl7HzBBVCiQx_eYO4)._

Top comments (0)