Everyone thinks parsing a bank statement should be simple. It's just a list of transactions, right?
Wrong.
After building parsers for dozens of document types, bank statements remain one of the most deceptively complex. Here's what we learned handling 500+ different formats.
The Format Explosion
There are roughly 4,500 FDIC-insured banks in the US alone. Add credit unions, international banks, and neobanks, and you're looking at tens of thousands of institutions. Each one formats their statements differently.
Chase uses a clean columnar layout.
Bank of America loves multi-page summaries before showing transactions.
Wells Fargo splits deposits and withdrawals into separate sections.
Capital One sometimes puts the date first, sometimes the description.
And that's just the big guys. Regional banks and credit unions often have PDF layouts that look like they were designed in 1998 using Microsoft Publisher.
Why Template Matching Fails
Our first approach was template matching. For each bank, we'd define:
- Where the date column lives
- The format of amounts (with or without dollar signs, parentheses for negatives)
- How to identify the transaction type
This worked for about 6 months. Then we hit three problems:
- Banks update their statements - Chase redesigned their PDF layout twice in one year
- The long tail is brutal - We'd get a statement from "First National Bank of Rural County" and have to build a new template
- Same bank, different products - A checking statement layout differs from a savings statement differs from a business account
We were building 5-10 new templates per week. It wasn't sustainable.
The OCR Problem
Raw OCR gives you text, but bank statements are fundamentally about tables. The spatial relationship between columns matters.
Consider this line:
02/15 AMAZON MARKETPLACE -$47.99 $1,234.56
OCR sees: 02/15 AMAZON MARKETPLACE -$47.99 $1,234.56
But which number is the transaction amount and which is the running balance? In some formats, the balance comes first. In others, it's not shown at all.
The Breakthrough: Vision Models + Table Understanding
Modern vision LLMs don't just read text. They understand layout. They can look at a bank statement and recognize:
- This is a table structure
- These are column headers (even if implicit)
- This row is a transaction
- This is a summary/total row (skip it)
The architecture that works:
PDF → Image → Vision LLM → Table Extraction → Schema Validation → JSON
The schema is critical. We define exactly what we expect:
{
"account": {
"holder_name": "string",
"account_number": "string",
"routing_number": "string",
"account_type": "checking|savings|business"
},
"period": {
"start_date": "date",
"end_date": "date"
},
"transactions": [{
"date": "date",
"description": "string",
"amount": "number",
"type": "credit|debit",
"category": "string",
"running_balance": "number|null"
}],
"summary": {
"opening_balance": "number",
"closing_balance": "number",
"total_credits": "number",
"total_debits": "number"
}
}
Edge Cases That Will Break You
Even with vision models, bank statements have edge cases:
Multi-page transactions - A single transaction description can wrap across pages
Pending vs. posted - Some statements show both, with different formatting
Foreign currency - Amount in USD vs. original currency, exchange rates
Interest calculations - Daily balance tables that aren't transactions
Fees buried in descriptions - "Monthly Service Fee" as a line item vs. as a deduction footnote
We handle these with a combination of prompt engineering and post-processing validation. If the extracted transactions don't reconcile to the stated totals, we retry with more specific instructions.
Results
After 8 months of iteration:
- 96% accuracy on transaction extraction
- 500+ bank formats supported without manual templates
- New formats work automatically (the vision model generalizes)
- Processing time: 2-5 seconds per page
The API
We wrapped this into an API. Upload a bank statement PDF, get structured JSON:
curl -X POST https://statementocr.com/api/parse \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@statement.pdf"
Response:
{
"account": {
"holder_name": "John Smith",
"account_number": "****4567"
},
"transactions": [
{
"date": "2024-02-01",
"description": "DIRECT DEPOSIT - ACME CORP",
"amount": 3500.00,
"type": "credit"
},
{
"date": "2024-02-03",
"description": "AMAZON MARKETPLACE",
"amount": -47.99,
"type": "debit"
}
],
"summary": {
"opening_balance": 1234.56,
"closing_balance": 4686.57
}
}
Who's Using This?
Three main use cases:
- Lending platforms - Income verification without Plaid/bank linking
- Accounting software - Auto-import statements for reconciliation
- Fraud detection - Analyze spending patterns at scale
The lending use case is huge. Not everyone wants to connect their bank account via OAuth. Some customers prefer uploading a PDF. And for businesses, bank statements are often the only option.
Try It
If you're building anything that needs to understand bank statements, Statement OCR has a free tier. Upload a few statements and see the output.
Works with most US banks out of the box. International support is improving.
Part 2 of a series on document parsing. Previously: EOB parsing. Next: tax documents.
Top comments (0)