Every country has address quirks. India's are in a league of their own.
There is no universal street numbering system. The same city can appear as "Bengaluru", "Bangalore", "BLR" or "ಬೆಂಗಳೂರು" depending on who's writing it.
Addresses frequently include landmarks instead of street names.. "Near SBI ATM, Opposite Ganesh Temple" is a real, functional address in India. And there are 23,915 pincodes covering a population of 1.4 billion people.
If you are building e-commerce, logistics, fintech or govtech in India, you have hit this wall. The existing options are either paid APIs (Google or India Post's SOAP service), patented solutions (Delhivery) or rolling your own regex and hoping for the best.
I did my research.. there is libpostal — the go-to open source address parser but it doesn't handle Indian formats well.
So I built bharataddress.
It is an open source Python library that parses unstructured Indian addresses into structured JSON, entirely offline. No API keys, no network calls, no rate limits. Deterministic parsing.. no LLM, no ML, predictable output every time.
So What's inside:
v0.1.1: 44% exact match on 200 real-world Indian addresses, deterministic parser, no ML, 23k pincodes, MIT.
Full embedded dataset of 23,915 Indian pincodes with state, district and post office mappings.
Freeform address string parsing.. throw a messy address at it and get structured JSON back
Pincode validation and reverse lookup (pincode → state, district, post offices)
MIT licensed, zero external service dependencies
pip install bharataddress
The hard problems I'm solving next ..
Fuzzy matching is the big one. "Bangalroe" should resolve to Bangalore. "Mumabi" should resolve to Mumbai. Indian addresses are frequently handtyped on mobile keyboards and typos are the norm not the exception.
Devanagari and regional script support is next after that. A huge portion of domestic addresses in India are written in Hindi, Tamil, Telugu, Kannada or other regional scripts. The library needs to handle that gracefully.
And then there's the landmark problem. "Near [X]" and "Opp. [Y]" are structural components of Indian addresses that don't map to any Western address parsing model. I am still thinking about how to handle these.. likely as metadata rather than trying to geocode them.
welcome your feedback and comments on that..
GitHub: https://github.com/Neelagiri65/bharataddress
If you have built anything that touches Indian address data, I would like to hear what broke for you. That is what shapes the roadmap for bharataddress.
Top comments (0)