Built a Python toolkit for Indian addresses. 26,700+ pincodes, no standard format, landmarks instead of street names, multiple scripts. The usual chaos.
bharataddress handles parsing, formatting, validation, geocoding, address similarity, batch processing and DIGIPIN encoding. All offline. No API keys. No ML. 4.3MB total.
62.5% exact match on a public 200-address gold set. Tested head to head against Shiprocket's 760MB TinyBERT NER model on the same test set. bharataddress wins on 6 of 9 fields. Fully reproducible.
What you get:
- parse() turns messy address strings into structured JSON
- geocode() gives you lat/lng from pincode centroids for 16,400+ pincodes
- encode_digipin() generates India Post's new 10-char geo-code
- format() outputs India Post / single-line / shipping label styles
- validate() checks consistency and flags whether an address is deliverable
- address_similarity() gives you a 0-1 score for dedup
- parse_csv() and parse_dataframe() for bulk processing
- extract_state_from_gstin() pulls state from GST numbers
pip install bharataddress
https://github.com/Neelagiri65/bharataddress
100 tests. MIT licensed. First open-source Indian address parser with DIGIPIN support.
Top comments (0)