DEV Community

TioZe
TioZe

Posted on

I built a local-only PDF bank statement parser with a plugin system — here's how it works

I built a local-only PDF bank statement parser — here's how it works

Every month I'd open my credit card PDF, manually copy transactions into a spreadsheet, and think "there has to be a better way." So I built banksheet.

What it does

banksheet parses bank statement PDFs into CSV, Excel, or JSON — entirely on your machine. No cloud, no AI, no external APIs.

npx banksheet parse statement.pdf
npx banksheet parse statement.pdf -f excel -o output.xlsx
Enter fullscreen mode Exit fullscreen mode

The plugin architecture

The interesting part is how new banks get added. Each bank is a single folder implementing two functions:

export const myBankParser: BankParser = {
  name: 'My Bank',
  country: 'US',

  detect(text: string): boolean {
    return /My Bank Statement/i.test(text);
  },

  parse(text: string): Transaction[] {
    // extract transactions from raw PDF text
    return transactions;
  },
};
Enter fullscreen mode Exit fullscreen mode

That's it. detect() tells the engine whether this PDF belongs to that bank. parse() extracts the transactions. Auto-detection tries all registered parsers and uses whichever one matches.

Why no AI?

I specifically wanted this to be regex-based, not LLM-based.

  • Deterministic: same PDF always produces the same output
  • Auditable: you can read exactly what the parser is doing
  • Fast: no inference time, no API latency
  • Private: your transaction data stays local

The tradeoff is that each bank needs a hand-written parser. But that's also why the plugin system exists — the community can cover banks I don't have statements for.

What's hard about parsing bank PDFs

PDF text extraction is messier than you'd expect. Banks don't use consistent formatting. Common issues:

  • Words get merged: "R$1.234,56Compra" with no space between amount and description
  • Dates split across lines depending on column layout
  • Multi-line transaction descriptions that need to be concatenated
  • Password-protected files (supported via pdf-parse options)
  • Different layouts for the same bank depending on statement month or account type

Each plugin's README documents the quirks it handles.

Stack

TypeScript monorepo with three packages: core (parsing engine + plugins), cli, and web (Express + vanilla JS for drag-and-drop).

Currently supports Nubank, Itaú, Bradesco, and Inter (Brazilian banks, credit card). More banks welcome.

Repo: https://github.com/tio-ze-rj/banksheet

Would love to hear if anyone adds a plugin for a bank outside Brazil — the architecture should work for any country.

Top comments (0)