I built a local-only PDF bank statement parser — here's how it works
Every month I'd open my credit card PDF, manually copy transactions into a spreadsheet, and think "there has to be a better way." So I built banksheet.
What it does
banksheet parses bank statement PDFs into CSV, Excel, or JSON — entirely on your machine. No cloud, no AI, no external APIs.
npx banksheet parse statement.pdf
npx banksheet parse statement.pdf -f excel -o output.xlsx
The plugin architecture
The interesting part is how new banks get added. Each bank is a single folder implementing two functions:
export const myBankParser: BankParser = {
name: 'My Bank',
country: 'US',
detect(text: string): boolean {
return /My Bank Statement/i.test(text);
},
parse(text: string): Transaction[] {
// extract transactions from raw PDF text
return transactions;
},
};
That's it. detect() tells the engine whether this PDF belongs to that bank. parse() extracts the transactions. Auto-detection tries all registered parsers and uses whichever one matches.
Why no AI?
I specifically wanted this to be regex-based, not LLM-based.
- Deterministic: same PDF always produces the same output
- Auditable: you can read exactly what the parser is doing
- Fast: no inference time, no API latency
- Private: your transaction data stays local
The tradeoff is that each bank needs a hand-written parser. But that's also why the plugin system exists — the community can cover banks I don't have statements for.
What's hard about parsing bank PDFs
PDF text extraction is messier than you'd expect. Banks don't use consistent formatting. Common issues:
- Words get merged: "R$1.234,56Compra" with no space between amount and description
- Dates split across lines depending on column layout
- Multi-line transaction descriptions that need to be concatenated
- Password-protected files (supported via pdf-parse options)
- Different layouts for the same bank depending on statement month or account type
Each plugin's README documents the quirks it handles.
Stack
TypeScript monorepo with three packages: core (parsing engine + plugins), cli, and web (Express + vanilla JS for drag-and-drop).
Currently supports Nubank, Itaú, Bradesco, and Inter (Brazilian banks, credit card). More banks welcome.
Repo: https://github.com/tio-ze-rj/banksheet
Would love to hear if anyone adds a plugin for a bank outside Brazil — the architecture should work for any country.
Top comments (0)