My finance app DonFlow has no backend. Everything runs in the browser. That was a conscious choice — I wrote about why I chose IndexedDB over a server.
But then came the hard part: users need to import their actual bank data.
The Problem Nobody Talks About
Every bank exports CSV differently. Column names, date formats, encoding, delimiters — nothing is standard. In Korea alone, there are 14 major card companies, and each one has its own export format.
I could've said "use our template" and made users reformat their data manually. But that's the #1 reason people abandon finance apps — the data entry friction.
My Approach: Universal Format Detection
Instead of maintaining 14 separate parsers, I built a detection pipeline:
- Encoding detection — Korean banks love EUC-KR. The browser expects UTF-8. Step one: detect and convert.
- Delimiter sniffing — Some use commas, others tabs, some even pipes. Read the first 5 lines, count candidates, pick the winner.
- Header matching — Map known column patterns (날짜, 금액, 거래처 → date, amount, merchant) with fuzzy matching.
- Date format inference — 2026-03-02? 03/02/2026? 20260302? Try multiple parsers, keep what works.
function detectFormat(rawText) {
const encoding = detectEncoding(rawText);
const decoded = decode(rawText, encoding);
const delimiter = sniffDelimiter(decoded);
const rows = parseCSV(decoded, delimiter);
const headerMap = matchHeaders(rows[0]);
return { encoding, delimiter, headerMap };
}
The Encoding Nightmare
This deserves its own section because it almost broke me.
Korean financial institutions overwhelmingly use EUC-KR encoding in their exports. When you try to read that in a modern browser expecting UTF-8, you get mojibake — garbled characters that look like °¡°£³»¿ª.
The fix: detect encoding before parsing. I used a byte-pattern heuristic — EUC-KR characters fall in specific byte ranges (0xA1-0xFE for both bytes). If more than 30% of multi-byte sequences match EUC-KR patterns, decode accordingly.
function detectEncoding(buffer) {
let euckrScore = 0;
let utf8Score = 0;
for (let i = 0; i < buffer.length - 1; i++) {
if (buffer[i] >= 0xA1 && buffer[i] <= 0xFE &&
buffer[i+1] >= 0xA1 && buffer[i+1] <= 0xFE) {
euckrScore++;
}
if (buffer[i] >= 0xC0 && buffer[i] <= 0xDF &&
buffer[i+1] >= 0x80 && buffer[i+1] <= 0xBF) {
utf8Score++;
}
}
return euckrScore > utf8Score ? 'euc-kr' : 'utf-8';
}
💡 Building browser-only tools on $0? I documented every decision in my free playbook.
Fuzzy Header Matching
Banks rename columns without warning. "거래일자" becomes "거래일" becomes "날짜" becomes "이용일". Same data, different labels.
Instead of a strict mapping table, I built a fuzzy matcher with aliases:
const KNOWN_HEADERS = {
date: ['거래일자', '거래일', '날짜', '이용일', '승인일자', 'date', 'transaction_date'],
amount: ['거래금액', '금액', '이용금액', '결제금액', 'amount', 'transaction_amount'],
merchant: ['거래처', '가맹점', '이용처', '상호명', 'merchant', 'description'],
};
function matchHeader(columnName) {
for (const [field, aliases] of Object.entries(KNOWN_HEADERS)) {
for (const alias of aliases) {
if (levenshtein(columnName.trim(), alias) <= 2) {
return field;
}
}
}
return null;
}
A Levenshtein distance of ≤2 catches typos and minor variations while avoiding false positives. This handles about 95% of Korean bank formats I've tested.
Preview Before Commit
The most important UX decision: never auto-save imported data.
After detection, the app shows a preview:
- "Found 142 transactions from Jan 2026 to Feb 2026"
- "Date column: 거래일자 ✓"
- "Amount column: 이용금액 ✓"
- "Merchant column: 가맹점명 ✓"
- Sample of first 5 rows in a table
One "Confirm Import" button. If something looks wrong, users can re-upload or manually adjust the column mapping.
This prevents garbage data from corrupting months of careful tracking. Trust me — debugging "why is my February budget showing ₩-3,400,000" because a column was mismatched is not fun.
SheetJS: One Library, Two Formats
The other big win was using SheetJS instead of writing separate CSV and XLSX parsers.
import * as XLSX from 'xlsx';
function parseFile(file) {
return new Promise((resolve) => {
const reader = new FileReader();
reader.onload = (e) => {
const workbook = XLSX.read(e.target.result, { type: 'array' });
const sheet = workbook.Sheets[workbook.SheetNames[0]];
const rows = XLSX.utils.sheet_to_json(sheet, { header: 1 });
resolve(rows);
};
reader.readAsArrayBuffer(file);
});
}
Same code handles .csv, .xlsx, .xls. SheetJS detects the format internally. One input path, all formats supported.
The Result
Users can drag-and-drop any Korean bank export — CSV or XLSX, EUC-KR or UTF-8, comma or tab-delimited — and DonFlow figures out the rest.
- 14 bank formats tested and working
- 200+ merchant patterns for auto-categorization
- Zero manual mapping for known formats
- All client-side — files never leave the browser
The whole pipeline runs in FileReader → SheetJS → detection → preview → IndexedDB. No server touched.
This is part of my Building a Finance App With No Server series. Previously: Why I Chose IndexedDB Over a Backend.
Top comments (1)
This is seriously impressive.
Anyone who has dealt with bank CSVs knows how messy they are — encoding issues alone can ruin your whole day. Handling EUC-KR, fuzzy headers, delimiter sniffing… all in the browser? That’s not beginner-level stuff.
The preview-before-import decision is the real MVP here. That one UX choice probably saves users from so many silent data disasters.
Really solid engineering + practical thinking.