If you use UPI for everything (and at this point, who in India doesn't?), your bank statement is a wall of cryptic merchant strings: UPI/zomatoonline@paytm/..., UPI/SWIGGY-BANGALORE/..., UPI/9876543210@oksbi/.... Trying to figure out where your money actually goes by reading that mess is a lost cause.
I built a 95-line Python script that ingests a UPI statement CSV from any major Indian bank, classifies each transaction into a sensible category, and spits out a monthly summary. No paid services, no APIs, no LLMs in the loop — just regex, a keyword map, and pandas.
Here's exactly how it works.
The problem
Every bank exports UPI transactions a little differently, but the merchant string is roughly always there. The trick is that real merchants leak through:
- Zomato, Swiggy, Dunzo → food delivery
- BigBasket, Blinkit, Zepto, DMart → groceries
- Uber, Ola, Rapido, IRCTC → transport
- Airtel, Jio, Vi, Tata Power, BESCOM → bills
- Amazon, Flipkart, Myntra → shopping
If we maintain a keyword → category dictionary and walk every row, we can categorize 80–90% of statements automatically. The remaining slice falls into "Other" and you can label them by hand once a month.
The full code
Save this as upi_categorizer.py:
import re
import sys
from pathlib import Path
import pandas as pd
CATEGORIES = {
"Food": [
"zomato", "swiggy", "dunzo", "eatfit", "faasos", "behrouz",
"dominos", "pizzahut", "kfc", "mcdonalds", "starbucks", "cafe",
],
"Groceries": [
"bigbasket", "blinkit", "grofers", "zepto", "dmart", "instamart",
"reliancefresh", "natures", "spencer", "more-retail",
],
"Transport": [
"uber", "olacabs", "rapido", "irctc", "redbus", "abhibus",
"namma-yatri", "blusmart", "yulu", "vogo",
],
"Bills": [
"airtel", "jio", "vodafone", "vi-mobile", "bsnl", "tatapower",
"bescom", "adani-electricity", "torrent-power", "mahanagargas",
"act-fibernet", "hathway",
],
"Shopping": [
"amazon", "flipkart", "myntra", "ajio", "nykaa", "meesho",
"tatacliq", "firstcry", "lenskart", "boat-lifestyle",
],
"Investments": [
"zerodha", "groww", "upstox", "kuvera", "paytmmoney",
"indmoney", "angelone", "smallcase",
],
"Health": [
"pharmeasy", "1mg", "netmeds", "apollo", "practo",
"cult-fit", "healthifyme",
],
"Entertainment": [
"netflix", "hotstar", "primevideo", "sonyliv", "zee5",
"spotify", "gaana", "bookmyshow", "pvr", "inox",
],
}
UPI_PATTERN = re.compile(r"upi[/-]([a-z0-9.\-_@]+)", re.IGNORECASE)
def extract_merchant(narration: str) -> str:
"""Pull the merchant slug out of a UPI narration string."""
if not isinstance(narration, str):
return ""
match = UPI_PATTERN.search(narration.lower())
return match.group(1) if match else narration.lower()
def categorize(narration: str) -> str:
merchant = extract_merchant(narration)
for category, keywords in CATEGORIES.items():
for keyword in keywords:
if keyword in merchant:
return category
return "Other"
def load_statement(path: Path) -> pd.DataFrame:
df = pd.read_csv(path)
# Normalize the columns most banks use; adjust as needed.
df.columns = [c.strip().lower() for c in df.columns]
rename_map = {
"transaction date": "date", "txn date": "date", "value date": "date",
"narration": "narration", "description": "narration", "details": "narration",
"withdrawal amt.": "debit", "withdrawal": "debit", "debit": "debit",
"deposit amt.": "credit", "deposit": "credit", "credit": "credit",
}
df = df.rename(columns={k: v for k, v in rename_map.items() if k in df.columns})
df["debit"] = pd.to_numeric(df.get("debit", 0), errors="coerce").fillna(0)
df["date"] = pd.to_datetime(df["date"], errors="coerce", dayfirst=True)
return df.dropna(subset=["date"])
def summarize(df: pd.DataFrame) -> pd.DataFrame:
df = df[df["debit"] > 0].copy()
df["category"] = df["narration"].apply(categorize)
df["month"] = df["date"].dt.to_period("M")
pivot = df.pivot_table(
index="month", columns="category", values="debit",
aggfunc="sum", fill_value=0,
)
pivot["Total"] = pivot.sum(axis=1)
return pivot.round(0)
if __name__ == "__main__":
path = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("statement.csv")
summary = summarize(load_statement(path))
print(summary.to_string())
summary.to_csv("upi_summary.csv")
print("\nSaved upi_summary.csv")
How to use it
- Log in to net banking and download your last 3-6 months of statement as CSV.
- Save it as
statement.csvin the same folder as the script. - Run
python upi_categorizer.py statement.csv.
You'll get a monthly pivot like:
month Bills Food Groceries Shopping Transport Other Total
2026-02 3450 6280 9120 4500 2150 1820 27320
2026-03 3450 5840 10100 6700 1980 2310 30380
2026-04 3450 7120 9550 3200 2620 1950 27890
Three months in, you have a ground-truth picture of where roughly ₹85,000 went.
Tweaks worth making
The keyword list is the soul of this script. Spend ten minutes adding the merchants YOUR statement actually shows — the long tail is regional (Big Bazaar in some cities, Star Bazaar in others, local kirana with their own UPI handle). Every keyword you add today saves you from "Other" forever.
If you want to go further:
- Auto-tag recurring UPI handles: track per-handle medians and flag transactions that deviate by 3x.
- Detect EMIs: same amount, same merchant, monthly cadence — pull these into their own bucket so they don't pollute "Shopping".
-
Streak the savings rate: subtract
Totalfrom monthly credits and chart the gap.
I run a version of this on the 1st of every month via cron. Takes 2 seconds, replaces a 30-minute manual sort, and the only thing I update is the keyword list.
Why I stopped using budget apps for this
Two reasons. One, I don't love handing over read-only bank credentials to a third party I haven't audited. Two, every budget app eventually starts pushing premium tiers, and the ones that survive long enough to be useful tend to be acquired and shut down. A 95-line script that runs on my laptop has none of those problems.
The keyword map is the only thing that needs maintenance, and that's a feature — your categories should match how YOU spend, not how a startup's data scientist thought you'd spend.
If you want more scripts like this (PDF invoice parsers, GST reconciliation, WhatsApp/Telegram automations), follow me on Twitter @automate_archit for daily AI automation tips.
Top comments (0)