The Problem That Started It All
A few weeks ago, someone approached me with what seemed like a simple request: "Can you help convert our assembly constituency's electoral roll into an Excel sheet? We need to verify voter data digitally for our campaign."
Simple enough, right? Just download the PDFs and convert them to a spreadsheet.
Except it wasn't simple at all.
What I discovered was a maze of bureaucratic digital infrastructure that keeps India's electoral data locked in PDFs, making digital verification nearly impossible for citizens, candidates, and researchers alike.
This is the story of why India's 900+ million voter records exist in digital limbo — technically online, but practically inaccessible for any meaningful digital analysis.
The Digital Paradox: Data That Exists But Doesn't
Here's the irony: India's Election Commission (ECI) has spent millions digitizing electoral rolls. The data exists in databases. APIs serve this data in real-time for searches. Yet, when you want to actually work with this data — verify voters, analyze demographics, or cross-check registrations — you're stuck with:
- Scanned PDF images (not even searchable text in many cases)
- No CSV/Excel exports available
- No bulk data access for researchers
- Individual lookups only through a web interface with captchas
Why does this matter?
Imagine you're a candidate in an Assembly Constituency with 200,000 voters spread across 10 parts. You want to:
- Verify if your supporters are registered
- Check for duplicate registrations
- Analyze demographic distribution
- Plan booth-level strategies
Current solution: Manually look through 10 PDF files, each 200–300 pages long. No search, no filter, no sort. Just Ctrl+F if you're lucky and the PDF has text layers.
What you actually need: A spreadsheet. A database. Anything digital.
But that doesn't exist. At least, not officially.
Discovery: The Hidden Infrastructure Nobody Talks About
While trying to solve this PDF-to-Excel problem, I started reverse engineering ECI's website. What I found was surprising.
The Hidden Gateway APIs
Standard subdomain enumeration tools show you the obvious ECI subdomains: www.eci.gov.in, voters.eci.gov.in, results.eci.gov.in. But two critical subdomains were completely invisible to enumeration:
gateway-voters.eci.gov.in
gateway-officials.eci.gov.in
Why is this significant?
These gateways handle ALL the backend operations: voter searches, electoral roll generation, PDF downloads, and real-time data queries.
The data IS digital. It IS structured. It IS in databases. The APIs prove it. But there's deliberately no public interface to export this structured data.
The Complete Infrastructure Map
Active Subdomains (14 with IP Resolution):
| Subdomain | IP Address | Purpose | Cloudflare |
|---|---|---|---|
| voters.eci.gov.in | 2.16.10.151 | Main voter portal | OFF |
| results.eci.gov.in | 2.19.198.57 | Election results | OFF |
| cvigil.eci.gov.in | 164.100.229.90 | Citizen vigilance | OFF |
| suvidha.eci.gov.in | 164.100.85.125 | Official portal | OFF |
| ems.eci.gov.in | 164.100.229.206 | Election management | OFF |
Inactive/Non-Resolving Subdomains (6):
-
api.eci.gov.in(suggests there WAS an API portal) -
dev.eci.gov.in(development environment) -
resultapi.eci.gov.in(result APIs) -
voterhelpline.eci.gov.in(helpline portal)
Security observation: ZERO Cloudflare protection on any subdomain. All directly exposed.
The API Architecture: Digital Data That You Can't Access
Gateway-Voters API Structure
Base URL: https://gateway-voters.eci.gov.in/api/v1/
The API is extensive and well-structured.
1. Master Data APIs (Publicly Accessible)
GET /common/states
GET /common/districts/{stateCode}
GET /common/constituencies?stateCode=S22
GET /common/acs/S2223
These work. No authentication. You can get lists of all states, districts, and constituencies. The data is RIGHT THERE.
2. Search APIs (Captcha Protected)
POST /elastic/search-by-epic-from-national-display
{
"epicNumber": "232452452",
"stateCd": "S22",
"captchaData": "j692pe",
"captchaId": "BBB1BD...",
"securityKey": "na"
}
You can search for individual voters. One at a time. With captcha.
Notice the /elastic/ endpoint — the data lives in Elasticsearch, a search engine purpose-built for massive datasets. That means the data is already structured, indexed, and queryable for 900M+ records. Bulk exports would be trivial to implement. But bulk queries aren't allowed.
3. Electoral Roll Publishing APIs
POST /printing-publish/get-publish-part-list
{
"acNumber": 186,
"stateCd": "S22",
"year": 2026,
"rollTypeRefId": "SIR-DraftRoll"
}
This returns 7 parts for AC 186, in both English and Tamil — that's 14 PDF files you need to download manually.
4. The Mystery Parameter: "securityKey": "na"
Every single API call includes this, always set to "na". This suggests a legacy authentication system that was removed, or a placeholder for future security that never landed. Either way — it's just... there.
The PDF Problem: Why PDFs Are Digital Jail
Before 2025: The Individual PDF Nightmare
The old URL pattern:
https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/186/
2026-EROLLGEN-S22-186-SIR-DraftRoll-Revision1-ENG-1-WI.pdf
For Tamil Nadu's 234 Assembly Constituencies:
- 234 ACs × 7 parts average × 2 languages = 3,276 individual PDF files
- Each PDF: 200–500 pages
- Total: ~700,000 pages of voter data
- Format: Often scanned images, not searchable text
- Size: ~50–100 GB of data
To digitize this, you'd need to download 3,276 files (with captchas), OCR the scanned pages, extract inconsistent tables, clean and structure everything, and de-duplicate across files. Estimated time: weeks to months.
This is why digital voter verification doesn't exist.
2025: The ZIP File Revolution (Sort Of)
ECI recently added bulk download options:
Per-AC ZIP Files:
https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/185-eroll.zip
State-wide Bulk Portal:
https://voters.eci.gov.in/download-sir-draft-roll?stateCode=S22
The catch? It's still PDFs inside those ZIPs. The fundamental problem is unchanged.
Why PDFs Are The Wrong Format
PDFs are designed for printing, not data analysis:
❌ Can't sort by age, gender, or locality
❌ Can't filter duplicate names
❌ Can't programmatically verify registrations
❌ Can't merge with other datasets
❌ OCR introduces errors (especially with Indian names)
❌ No export to Excel/CSV available
What campaigns and researchers actually need:
✅ CSV/Excel with structured columns
✅ JSON from the API endpoints
✅ Direct database exports (even read-only)
✅ One-click export to Excel for digital verification
The technology exists. The databases exist. The APIs exist. But PDFs force you back to manual, paper-based verification even though the data is digital.
Breaking The Lock: Understanding the URLs
Since ECI won't provide structured data, here are the URL patterns — understanding them helps you see how the system works.
Direct Download URLs (No Code Needed)
Individual AC ZIP files:
https://voters.eci.gov.in/eroll/2026/{state_code}/sir-draftroll/{ac_number}-eroll.zip
# Examples:
https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/185-eroll.zip
https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/186-eroll.zip
State bulk download portal:
https://voters.eci.gov.in/download-sir-draft-roll?stateCode=S22
SIR Search (check name in new rolls):
https://voters.eci.gov.in/searchInSIR/{UNIQUE_ID}
The Download Challenge
Browser download: click the URL → ZIP downloads instantly ✅
Command-line download (wget/curl): gets blocked with 403 Forbidden ❌
Why? The server checks the User-Agent header. Simple browser mimicry bypasses this:
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" \
--referer="https://voters.eci.gov.in/" \
"https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/185-eroll.zip"
No complex code needed. Just proper headers.
The Workflow That Actually Works
Based on my experience helping that campaign digitize their AC data:
Step 1: Identify Your URLs
# Your constituency ZIP:
https://voters.eci.gov.in/eroll/2026/{state_code}/sir-draftroll/{ac_number}-eroll.zip
# State bulk download:
https://voters.eci.gov.in/download-sir-draft-roll?stateCode={STATE_CODE}
Step 2: Download ZIPs
Manual (easiest): Open browser, navigate to the ZIP URL, click download.
Command line (faster for bulk):
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)" \
--referer="https://voters.eci.gov.in/" \
"https://voters.eci.gov.in/eroll/2026/s22/sir-draftroll/185-eroll.zip"
Step 3: Extract PDFs
Unzip downloaded files, pick the English version for easier processing.
Step 4: PDF to Excel Conversion
- Adobe Acrobat (Export to Excel)
- Tabula (Free, open-source)
- Online converters (smallpdf.com, ilovepdf.com)
- Python with
tabula-py(for automation)
Step 5: Data Cleaning
pip install requests pandas tabula-py openpyxl
Remove duplicates, fix OCR name errors, standardize formats, validate EPIC numbers.
Total time for a single AC: 2–3 days first time, a few hours after the learning curve.
What it should take: 5 minutes with a CSV download button.
The Real Question: Why Is This Data Hidden?
The data is already digital. The infrastructure proves it. So why can't citizens access structured electoral data?
Privacy Concerns (Legitimate) — Electoral rolls contain full name, age, address, EPIC number. Counter-argument: this data is already public. Anyone can visit the BLO and get printed copies. PDFs are freely downloadable. Making it Excel instead of PDF doesn't change privacy — it changes usability.
Preventing Misuse (Legitimate) — Bulk data could enable targeted misinformation or voter profiling. Counter-argument: malicious actors already have this. Professional data brokers and well-funded campaigns have digitized it. Only honest citizens, small candidates, and researchers are locked out.
Bureaucratic Inertia (Likely) — "This is how we've always done it." No technical understanding at the policy level. No incentive to change.
Political Control (Speculative) — Keeping data inaccessible favors established parties with resources to digitize, creates dependency on party databases, and limits citizen oversight.
Similar Systems Worldwide
| Country | Approach |
|---|---|
| USA | Most states publish voter files in CSV format; available for purchase or free download |
| UK | Electoral registers available for purchase in two versions (full and open) |
| Australia | Electoral rolls accessible to registered political parties in digital format |
| India | World's largest democracy. Elasticsearch backend. Data locked in PDFs. |
What Should Change: A Proposal
1. Provide Structured Data Exports
CSV/Excel downloads for each AC, updated with each revision, no captcha for bulk downloads.
2. Create Legitimate API Access
Developer portal with API keys, rate-limited but functional, documented endpoints, terms of use with penalties for misuse.
3. Maintain Privacy Balance
Remove exact addresses (keep locality), provide age brackets instead of birthdates, audit trail for access.
4. The Gateway APIs Already Exist!
The infrastructure is ALREADY THERE. The databases exist. The APIs work. Just add:
GET /api/v1/bulk/export-ac-data?acNumber=185&format=csv
That's it. One endpoint. Change digital democracy in India.
Conclusion: Data Liberation Is Democratic Rights
India prides itself on being the world's largest democracy. Yet our electoral data — the foundation of that democracy — is locked in a digital jail of PDFs.
The data exists. The technology exists. What's missing is the will to make it truly public.
To the Election Commission: You've done the hard work of digitization. Now make it truly accessible. Structured data isn't a security risk — it's a democratic necessity.
To researchers and activists: The tools and methods exist to digitize this data. It's tedious, but doable. Don't wait for permission.
To developers: Build the tools that should exist. PDF-to-Excel converters specifically for electoral rolls. Bulk downloaders. Verification APIs. Make them open source.
To candidates and campaigns: Demand better. You have the right to structured voter data for your constituency. Pressure ECI for CSV exports.
The Original Question Answered
Remember that person who asked me to digitize their AC electoral roll?
What I told them:
- "It's technically possible"
- "It will take 2–3 weeks"
- "You'll need someone who can code"
- "The data quality won't be perfect"
- "It shouldn't be this hard"
What I should have been able to say:
- "Download this CSV from the ECI website"
- "Takes 5 minutes"
- "Here's the Excel file"
That's the difference between a digital democracy and a PDF democracy.
Found this useful? Share it with candidates, researchers, and civic tech developers. Let's make electoral data actually accessible.
Have you tried digitizing voter data? What challenges did you face? Drop your experiences in the comments!
Legal note: Electoral rolls are public information under Indian law. Accessing publicly available data via public URLs is legal. Respect rate limits and don't overwhelm servers.
Top comments (0)