DEV Community

Hardik X
Hardik X

Posted on

How I Built an AI Bank Statement Parser for Indian CAs Using Python and Claude API

How I Built an AI Bank Statement Parser for Indian CAs Using Python and Anthropic API
As a solo founder, I built AarogyamFin — an AI-powered bank statement analyzer for Indian Chartered Accountants. Here's the technical story behind it.
The Problem
Indian bank statements come in 30+ different PDF formats. SBI looks completely different from HDFC. Kotak's format changes every 6 months. Manual parsing was impossible to scale.
The Architecture
PDF Upload → Bank Detection → Parser → OCR Fallback → AI Analysis
Tech Stack:

Backend: Flask (Python)
PDF Parsing: PyMuPDF + pdfplumber
OCR: Tesseract + pdf2image (for scanned PDFs)
AI: Anthropic API (transaction categorization + chat)
Database: Supabase (PostgreSQL)
Payments: Razorpay
Hosting: Railway
CDN/WAF: Cloudflare

Bank Detection Logic
Instead of asking users which bank they're from, I built an auto-detector:
pythondef detect_bank(pdf_path):
text = extract_text(pdf_path)
if "State Bank of India" in text:
return SBIParser()
elif "HDFC Bank" in text:
return HDFCParser()
# ... 30+ banks
else:
return UniversalParser()
The OCR Fallback
Some banks (BOB, PNB) generate image-based PDFs. PyMuPDF returns empty text. Solution:
pythonif not transactions:
from parsers.ocr_fallback import ocr_parse
transactions = ocr_parse(pdf_path)
Tesseract + pdf2image converts pages to images and extracts text.
Anthropic API for Intelligence
After extraction, Anthropic API categorizes transactions and powers the AI chat:
pythonresponse = anthropic.messages.create(
messages=[{"role": "user", "content": user_query}],
system=f"You are analyzing this bank statement data: {transactions}"
)
What I Learned

Indian PDFs are wildly inconsistent — regex-based parsers break constantly
OCR is slow but necessary for scanned statements
Anthropic API is the best cost/performance ratio for this use case
Per-parse pricing (₹49) works better than subscriptions for CA workflows

Result
AarogyamFin now supports 30+ Indian banks, deployed on Railway, serving CAs, DSAs and NBFCs across India.
Live at: aarogyamfin.com
DPIIT Recognised Startup — DIPP265941

Top comments (0)