DEV Community

Cover image for Reverse Engineering India’s Electoral Roll System: Why Can’t We Have Digital Voter Data?
Black Lover
Black Lover

Posted on

Reverse Engineering India’s Electoral Roll System: Why Can’t We Have Digital Voter Data?

The Problem That Started It All

A few weeks ago, someone approached me with what seemed like a simple request: "Can you help convert our assembly constituency's electoral roll into an Excel sheet? We need to verify voter data digitally for our campaign."

Simple enough, right? Just download the PDFs and convert them to a spreadsheet.

Except it wasn't simple at all.

What I discovered was a maze of bureaucratic digital infrastructure that keeps India's electoral data locked in PDFs, making digital verification nearly impossible for citizens, candidates, and researchers alike.

This is the story of why India's 900+ million voter records exist in digital limbo — technically online, but practically inaccessible for any meaningful digital analysis.


The Digital Paradox: Data That Exists But Doesn't

Here's the irony: India's Election Commission (ECI) has spent millions digitizing electoral rolls. The data exists in databases. APIs serve this data in real-time for searches. Yet, when you want to actually work with this data — verify voters, analyze demographics, or cross-check registrations — you're stuck with:

  • Scanned PDF images (not even searchable text in many cases)
  • No CSV/Excel exports available
  • No bulk data access for researchers
  • Individual lookups only through a web interface with captchas

Why does this matter?

Imagine you're a candidate in an Assembly Constituency with 200,000 voters spread across 10 parts. You want to:

  • Verify if your supporters are registered
  • Check for duplicate registrations
  • Analyze demographic distribution
  • Plan booth-level strategies

Current solution: Manually look through 10 PDF files, each 200–300 pages long. No search, no filter, no sort. Just Ctrl+F if you're lucky and the PDF has text layers.

What you actually need: A spreadsheet. A database. Anything digital.

But that doesn't exist. At least, not officially.


Discovery: The Hidden Infrastructure Nobody Talks About

While trying to solve this PDF-to-Excel problem, I started reverse engineering ECI's website. What I found was surprising.

The Hidden Gateway APIs

Standard subdomain enumeration tools show you the obvious ECI subdomains: www.eci.gov.in, voters.eci.gov.in, results.eci.gov.in. But two critical subdomains were completely invisible to enumeration:

gateway-voters.eci.gov.in
gateway-officials.eci.gov.in
Enter fullscreen mode Exit fullscreen mode

Why is this significant?

These gateways handle ALL the backend operations: voter searches, electoral roll generation, PDF downloads, and real-time data queries.

The data IS digital. It IS structured. It IS in databases. The APIs prove it. But there's deliberately no public interface to export this structured data.

The Complete Infrastructure Map

Active Subdomains (14 with IP Resolution):

Subdomain IP Address Purpose Cloudflare
voters.eci.gov.in 2.16.10.151 Main voter portal OFF
results.eci.gov.in 2.19.198.57 Election results OFF
cvigil.eci.gov.in 164.100.229.90 Citizen vigilance OFF
suvidha.eci.gov.in 164.100.85.125 Official portal OFF
ems.eci.gov.in 164.100.229.206 Election management OFF

Inactive/Non-Resolving Subdomains (6):

  • api.eci.gov.in (suggests there WAS an API portal)
  • dev.eci.gov.in (development environment)
  • resultapi.eci.gov.in (result APIs)
  • voterhelpline.eci.gov.in (helpline portal)

Security observation: ZERO Cloudflare protection on any subdomain. All directly exposed.


The API Architecture: Digital Data That You Can't Access

Gateway-Voters API Structure

Base URL: https://gateway-voters.eci.gov.in/api/v1/

The API is extensive and well-structured.

1. Master Data APIs (Publicly Accessible)

GET /common/states
GET /common/districts/{stateCode}
GET /common/constituencies?stateCode=S22
GET /common/acs/S2223
Enter fullscreen mode Exit fullscreen mode

These work. No authentication. You can get lists of all states, districts, and constituencies. The data is RIGHT THERE.

2. Search APIs (Captcha Protected)

POST /elastic/search-by-epic-from-national-display
{
  "epicNumber": "232452452",
  "stateCd": "S22",
  "captchaData": "j692pe",
  "captchaId": "BBB1BD...",
  "securityKey": "na"
}
Enter fullscreen mode Exit fullscreen mode

You can search for individual voters. One at a time. With captcha.

Notice the /elastic/ endpoint — the data lives in Elasticsearch, a search engine purpose-built for massive datasets. That means the data is already structured, indexed, and queryable for 900M+ records. Bulk exports would be trivial to implement. But bulk queries aren't allowed.

3. Electoral Roll Publishing APIs

POST /printing-publish/get-publish-part-list
{
  "acNumber": 186,
  "stateCd": "S22",
  "year": 2026,
  "rollTypeRefId": "SIR-DraftRoll"
}
Enter fullscreen mode Exit fullscreen mode

This returns 7 parts for AC 186, in both English and Tamil — that's 14 PDF files you need to download manually.

4. The Mystery Parameter: "securityKey": "na"

Every single API call includes this, always set to "na". This suggests a legacy authentication system that was removed, or a placeholder for future security that never landed. Either way — it's just... there.


The PDF Problem: Why PDFs Are Digital Jail

Before 2025: The Individual PDF Nightmare

The old URL pattern:

https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/186/
2026-EROLLGEN-S22-186-SIR-DraftRoll-Revision1-ENG-1-WI.pdf
Enter fullscreen mode Exit fullscreen mode

For Tamil Nadu's 234 Assembly Constituencies:

  • 234 ACs × 7 parts average × 2 languages = 3,276 individual PDF files
  • Each PDF: 200–500 pages
  • Total: ~700,000 pages of voter data
  • Format: Often scanned images, not searchable text
  • Size: ~50–100 GB of data

To digitize this, you'd need to download 3,276 files (with captchas), OCR the scanned pages, extract inconsistent tables, clean and structure everything, and de-duplicate across files. Estimated time: weeks to months.

This is why digital voter verification doesn't exist.

2025: The ZIP File Revolution (Sort Of)

ECI recently added bulk download options:

Per-AC ZIP Files:

https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/185-eroll.zip
Enter fullscreen mode Exit fullscreen mode

State-wide Bulk Portal:

https://voters.eci.gov.in/download-sir-draft-roll?stateCode=S22
Enter fullscreen mode Exit fullscreen mode

The catch? It's still PDFs inside those ZIPs. The fundamental problem is unchanged.

Why PDFs Are The Wrong Format

PDFs are designed for printing, not data analysis:

❌ Can't sort by age, gender, or locality

❌ Can't filter duplicate names

❌ Can't programmatically verify registrations

❌ Can't merge with other datasets

❌ OCR introduces errors (especially with Indian names)

❌ No export to Excel/CSV available

What campaigns and researchers actually need:

✅ CSV/Excel with structured columns

✅ JSON from the API endpoints

✅ Direct database exports (even read-only)

✅ One-click export to Excel for digital verification

The technology exists. The databases exist. The APIs exist. But PDFs force you back to manual, paper-based verification even though the data is digital.


Breaking The Lock: Understanding the URLs

Since ECI won't provide structured data, here are the URL patterns — understanding them helps you see how the system works.

Direct Download URLs (No Code Needed)

Individual AC ZIP files:

https://voters.eci.gov.in/eroll/2026/{state_code}/sir-draftroll/{ac_number}-eroll.zip

# Examples:
https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/185-eroll.zip
https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/186-eroll.zip
Enter fullscreen mode Exit fullscreen mode

State bulk download portal:

https://voters.eci.gov.in/download-sir-draft-roll?stateCode=S22
Enter fullscreen mode Exit fullscreen mode

SIR Search (check name in new rolls):

https://voters.eci.gov.in/searchInSIR/{UNIQUE_ID}
Enter fullscreen mode Exit fullscreen mode

The Download Challenge

Browser download: click the URL → ZIP downloads instantly ✅

Command-line download (wget/curl): gets blocked with 403 Forbidden

Why? The server checks the User-Agent header. Simple browser mimicry bypasses this:

wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \
     --referer="https://voters.eci.gov.in/" \
     "https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/185-eroll.zip"
Enter fullscreen mode Exit fullscreen mode

No complex code needed. Just proper headers.


The Workflow That Actually Works

Based on my experience helping that campaign digitize their AC data:

Step 1: Identify Your URLs

# Your constituency ZIP:
https://voters.eci.gov.in/eroll/2026/{state_code}/sir-draftroll/{ac_number}-eroll.zip

# State bulk download:
https://voters.eci.gov.in/download-sir-draft-roll?stateCode={STATE_CODE}
Enter fullscreen mode Exit fullscreen mode

Step 2: Download ZIPs

Manual (easiest): Open browser, navigate to the ZIP URL, click download.

Command line (faster for bulk):

wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)" \
     --referer="https://voters.eci.gov.in/" \
     "https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/185-eroll.zip"
Enter fullscreen mode Exit fullscreen mode

Step 3: Extract PDFs

Unzip downloaded files, pick the English version for easier processing.

Step 4: PDF to Excel Conversion

  • Adobe Acrobat (Export to Excel)
  • Tabula (Free, open-source)
  • Online converters (smallpdf.com, ilovepdf.com)
  • Python with tabula-py (for automation)

Step 5: Data Cleaning

pip install requests pandas tabula-py openpyxl
Enter fullscreen mode Exit fullscreen mode

Remove duplicates, fix OCR name errors, standardize formats, validate EPIC numbers.

Total time for a single AC: 2–3 days first time, a few hours after the learning curve.

What it should take: 5 minutes with a CSV download button.


The Real Question: Why Is This Data Hidden?

The data is already digital. The infrastructure proves it. So why can't citizens access structured electoral data?

Privacy Concerns (Legitimate) — Electoral rolls contain full name, age, address, EPIC number. Counter-argument: this data is already public. Anyone can visit the BLO and get printed copies. PDFs are freely downloadable. Making it Excel instead of PDF doesn't change privacy — it changes usability.

Preventing Misuse (Legitimate) — Bulk data could enable targeted misinformation or voter profiling. Counter-argument: malicious actors already have this. Professional data brokers and well-funded campaigns have digitized it. Only honest citizens, small candidates, and researchers are locked out.

Bureaucratic Inertia (Likely) — "This is how we've always done it." No technical understanding at the policy level. No incentive to change.

Political Control (Speculative) — Keeping data inaccessible favors established parties with resources to digitize, creates dependency on party databases, and limits citizen oversight.


Similar Systems Worldwide

Country Approach
USA Most states publish voter files in CSV format; available for purchase or free download
UK Electoral registers available for purchase in two versions (full and open)
Australia Electoral rolls accessible to registered political parties in digital format
India World's largest democracy. Elasticsearch backend. Data locked in PDFs.

What Should Change: A Proposal

1. Provide Structured Data Exports
CSV/Excel downloads for each AC, updated with each revision, no captcha for bulk downloads.

2. Create Legitimate API Access
Developer portal with API keys, rate-limited but functional, documented endpoints, terms of use with penalties for misuse.

3. Maintain Privacy Balance
Remove exact addresses (keep locality), provide age brackets instead of birthdates, audit trail for access.

4. The Gateway APIs Already Exist!

The infrastructure is ALREADY THERE. The databases exist. The APIs work. Just add:

GET /api/v1/bulk/export-ac-data?acNumber=185&format=csv
Enter fullscreen mode Exit fullscreen mode

That's it. One endpoint. Change digital democracy in India.


Conclusion: Data Liberation Is Democratic Rights

India prides itself on being the world's largest democracy. Yet our electoral data — the foundation of that democracy — is locked in a digital jail of PDFs.

The data exists. The technology exists. What's missing is the will to make it truly public.

To the Election Commission: You've done the hard work of digitization. Now make it truly accessible. Structured data isn't a security risk — it's a democratic necessity.

To researchers and activists: The tools and methods exist to digitize this data. It's tedious, but doable. Don't wait for permission.

To developers: Build the tools that should exist. PDF-to-Excel converters specifically for electoral rolls. Bulk downloaders. Verification APIs. Make them open source.

To candidates and campaigns: Demand better. You have the right to structured voter data for your constituency. Pressure ECI for CSV exports.


The Original Question Answered

Remember that person who asked me to digitize their AC electoral roll?

What I told them:

  • "It's technically possible"
  • "It will take 2–3 weeks"
  • "You'll need someone who can code"
  • "The data quality won't be perfect"
  • "It shouldn't be this hard"

What I should have been able to say:

  • "Download this CSV from the ECI website"
  • "Takes 5 minutes"
  • "Here's the Excel file"

That's the difference between a digital democracy and a PDF democracy.


Found this useful? Share it with candidates, researchers, and civic tech developers. Let's make electoral data actually accessible.

Have you tried digitizing voter data? What challenges did you face? Drop your experiences in the comments!


Legal note: Electoral rolls are public information under Indian law. Accessing publicly available data via public URLs is legal. Respect rate limits and don't overwhelm servers.

Top comments (0)