DEV Community

MaxxMini
MaxxMini

Posted on • Edited on

I Made My Finance App Understand 14 Different Bank Formats — Here's How

My finance app DonFlow has no backend. Everything runs in the browser. That was a conscious choice — I wrote about why I chose IndexedDB over a server.

But then came the hard part: users need to import their actual bank data.

The Problem Nobody Talks About

Every bank exports CSV differently. Column names, date formats, encoding, delimiters — nothing is standard. In Korea alone, there are 14 major card companies, and each one has its own export format.

I could've said "use our template" and made users reformat their data manually. But that's the #1 reason people abandon finance apps — the data entry friction.

My Approach: Universal Format Detection

Instead of maintaining 14 separate parsers, I built a detection pipeline:

  1. Encoding detection — Korean banks love EUC-KR. The browser expects UTF-8. Step one: detect and convert.
  2. Delimiter sniffing — Some use commas, others tabs, some even pipes. Read the first 5 lines, count candidates, pick the winner.
  3. Header matching — Map known column patterns (날짜, 금액, 거래처 → date, amount, merchant) with fuzzy matching.
  4. Date format inference — 2026-03-02? 03/02/2026? 20260302? Try multiple parsers, keep what works.
function detectFormat(rawText) {
  const encoding = detectEncoding(rawText);
  const decoded = decode(rawText, encoding);
  const delimiter = sniffDelimiter(decoded);
  const rows = parseCSV(decoded, delimiter);
  const headerMap = matchHeaders(rows[0]);
  return { encoding, delimiter, headerMap };
}
Enter fullscreen mode Exit fullscreen mode

The Encoding Nightmare

This deserves its own section because it almost broke me.

Korean financial institutions overwhelmingly use EUC-KR encoding in their exports. When you try to read that in a modern browser expecting UTF-8, you get mojibake — garbled characters that look like °¡°£³»¿ª.

The fix: detect encoding before parsing. I used a byte-pattern heuristic — EUC-KR characters fall in specific byte ranges (0xA1-0xFE for both bytes). If more than 30% of multi-byte sequences match EUC-KR patterns, decode accordingly.

function detectEncoding(buffer) {
  let euckrScore = 0;
  let utf8Score = 0;
  for (let i = 0; i < buffer.length - 1; i++) {
    if (buffer[i] >= 0xA1 && buffer[i] <= 0xFE &&
        buffer[i+1] >= 0xA1 && buffer[i+1] <= 0xFE) {
      euckrScore++;
    }
    if (buffer[i] >= 0xC0 && buffer[i] <= 0xDF &&
        buffer[i+1] >= 0x80 && buffer[i+1] <= 0xBF) {
      utf8Score++;
    }
  }
  return euckrScore > utf8Score ? 'euc-kr' : 'utf-8';
}
Enter fullscreen mode Exit fullscreen mode

💡 Building browser-only tools on $0? I documented every decision in my free playbook.

Fuzzy Header Matching

Banks rename columns without warning. "거래일자" becomes "거래일" becomes "날짜" becomes "이용일". Same data, different labels.

Instead of a strict mapping table, I built a fuzzy matcher with aliases:

const KNOWN_HEADERS = {
  date: ['거래일자', '거래일', '날짜', '이용일', '승인일자', 'date', 'transaction_date'],
  amount: ['거래금액', '금액', '이용금액', '결제금액', 'amount', 'transaction_amount'],
  merchant: ['거래처', '가맹점', '이용처', '상호명', 'merchant', 'description'],
};

function matchHeader(columnName) {
  for (const [field, aliases] of Object.entries(KNOWN_HEADERS)) {
    for (const alias of aliases) {
      if (levenshtein(columnName.trim(), alias) <= 2) {
        return field;
      }
    }
  }
  return null;
}
Enter fullscreen mode Exit fullscreen mode

A Levenshtein distance of ≤2 catches typos and minor variations while avoiding false positives. This handles about 95% of Korean bank formats I've tested.

Preview Before Commit

The most important UX decision: never auto-save imported data.

After detection, the app shows a preview:

  • "Found 142 transactions from Jan 2026 to Feb 2026"
  • "Date column: 거래일자 ✓"
  • "Amount column: 이용금액 ✓"
  • "Merchant column: 가맹점명 ✓"
  • Sample of first 5 rows in a table

One "Confirm Import" button. If something looks wrong, users can re-upload or manually adjust the column mapping.

This prevents garbage data from corrupting months of careful tracking. Trust me — debugging "why is my February budget showing ₩-3,400,000" because a column was mismatched is not fun.

SheetJS: One Library, Two Formats

The other big win was using SheetJS instead of writing separate CSV and XLSX parsers.

import * as XLSX from 'xlsx';

function parseFile(file) {
  return new Promise((resolve) => {
    const reader = new FileReader();
    reader.onload = (e) => {
      const workbook = XLSX.read(e.target.result, { type: 'array' });
      const sheet = workbook.Sheets[workbook.SheetNames[0]];
      const rows = XLSX.utils.sheet_to_json(sheet, { header: 1 });
      resolve(rows);
    };
    reader.readAsArrayBuffer(file);
  });
}
Enter fullscreen mode Exit fullscreen mode

Same code handles .csv, .xlsx, .xls. SheetJS detects the format internally. One input path, all formats supported.

The Result

Users can drag-and-drop any Korean bank export — CSV or XLSX, EUC-KR or UTF-8, comma or tab-delimited — and DonFlow figures out the rest.

  • 14 bank formats tested and working
  • 200+ merchant patterns for auto-categorization
  • Zero manual mapping for known formats
  • All client-side — files never leave the browser

The whole pipeline runs in FileReader → SheetJS → detection → preview → IndexedDB. No server touched.


This is part of my Building a Finance App With No Server series. Previously: Why I Chose IndexedDB Over a Backend.

Top comments (1)

Collapse
 
bhavin-allinonetools profile image
Bhavin Sheth

This is seriously impressive.

Anyone who has dealt with bank CSVs knows how messy they are — encoding issues alone can ruin your whole day. Handling EUC-KR, fuzzy headers, delimiter sniffing… all in the browser? That’s not beginner-level stuff.

The preview-before-import decision is the real MVP here. That one UX choice probably saves users from so many silent data disasters.

Really solid engineering + practical thinking.