DEV Community

FLOW by Vestelon
FLOW by Vestelon

Posted on

How I built a PDF bank statement analyzer in 8 languages (and what I learned)

I spent months building FLOW (vestelonflow.com) — a tool that analyzes bank statement PDFs and finds forgotten subscriptions, hidden fees, and recurring charges.

Here's what I learned building it in 8 languages.

The Problem

Most personal finance apps require you to connect your bank account. For many people (especially in Europe), that's a dealbreaker. GDPR concerns, privacy fears, and simply not trusting third-party apps with banking credentials.

My insight: the data people need is already in their PDF bank statements. Every bank generates them. Most people never look past the total.

The Tech Stack

The core flow:

  1. User uploads PDF bank statement
  2. PDF text extraction (pdfplumber + fallback OCR)
  3. Transaction parsing — this is the hard part
  4. LLM categorization pipeline
  5. Subscription detection (recurring charges with same merchant)
  6. Report generation

The trickiest part was transaction parsing. Every bank formats their PDF differently. German banks look nothing like Slovak banks. We ended up building bank-specific parsers for the most common formats and a fallback generic parser.

The 8-Language Challenge

Supporting Slovak, Czech, German, French, Spanish, Polish, Arabic, and Chinese wasn't just about translating the UI. The financial terminology varies significantly:

  • "Permanent order" in English = "Trvalý príkaz" in Slovak = "Dauerauftrag" in German
  • Subscription detection keywords differ by region
  • Date/amount formats are locale-specific

We ended up with language-specific merchant dictionaries for common subscription services in each market.

What Actually Matters

The biggest lesson: people don't want a budgeting dashboard. They want a specific, actionable number.

"You're spending €137/month on forgotten subscriptions" converts. "Your spending breakdown by category" does not.

The product is live at vestelonflow.com — first report is free, no card required, no bank connection needed.

Happy to answer questions about the PDF parsing approach, the LLM pipeline, or the localization challenges.

Top comments (1)

Collapse
 
alexshev profile image
Alex Shev

PDF statements are a good case because the hard part is not the LLM call, it is confidence around messy extraction.

I would want the report to expose "why this transaction was classified this way" and where OCR/parsing confidence was low. In finance tooling, a useful uncertainty marker is better than a clean-looking but wrong category.