A CSV Editor With RFC 4180 Parsing, Auto Delimiter Detection, and Markdown Export
Parsing CSV correctly means handling quoted fields, doubled-quote escapes, embedded newlines inside quotes, and trailing newlines. All of RFC 4180 is about 30 lines of state-machine JavaScript. Once you have that, exporting to Markdown / JSON / HTML / TSV is a few more functions on top.
Everyone underestimates CSV. line.split(',') works for 90% of files and fails badly on the other 10%. The real spec is RFC 4180, which allows fields to contain commas, quotes, and newlines if they're wrapped in double quotes, with doubled quotes as an escape.
🔗 Live demo: https://sen.ltd/portfolio/csv-tool/
📦 GitHub: https://github.com/sen-ltd/csv-tool
Features:
- RFC 4180 parser (handles quotes, escapes, embedded newlines)
- Auto delimiter detection (comma, tab, semicolon, pipe)
- Editable table with sort, search, pagination
- Add / remove rows and columns
- Export: CSV, TSV, JSON, Markdown, HTML
- Per-column type detection (number / date / boolean / string)
- Japanese / English UI
- Zero dependencies, 76 tests
The state machine
CSV parsing is a 3-state state machine: outside-quotes, inside-quotes, inside-quotes-just-saw-quote. The last state is where doubled-quote escapes get resolved:
export function parseCSV(text, delimiter = ',') {
const rows = [];
let row = [];
let field = '';
let inQuotes = false;
let i = 0;
while (i < text.length) {
const c = text[i];
if (inQuotes) {
if (c === '"') {
if (text[i + 1] === '"') {
// Escaped quote
field += '"';
i += 2;
} else {
// End of quoted field
inQuotes = false;
i++;
}
} else {
field += c;
i++;
}
} else {
if (c === '"') {
inQuotes = true;
i++;
} else if (c === delimiter) {
row.push(field);
field = '';
i++;
} else if (c === '\n' || c === '\r') {
row.push(field);
rows.push(row);
row = [];
field = '';
if (c === '\r' && text[i + 1] === '\n') i++;
i++;
} else {
field += c;
i++;
}
}
}
// Flush last field/row
if (field || row.length > 0) {
row.push(field);
rows.push(row);
}
return rows;
}
The tricky cases:
-
"foo,bar"→ single fieldfoo,bar(comma inside quotes is literal) -
"foo""bar"→ single fieldfoo"bar(doubled quote escapes) -
"line1\nline2"→ single field with embedded newline - Mixed
\r\nand\nline endings → both work
Delimiter detection
If the user doesn't specify, we guess by counting delimiter occurrences per line and picking the one with the most consistent count:
export function detectDelimiter(text) {
const candidates = [',', '\t', ';', '|'];
const lines = text.split(/\r?\n/).slice(0, 5);
let best = ',';
let bestScore = -1;
for (const delim of candidates) {
const counts = lines.map(l => (l.match(new RegExp(delim === '\t' ? '\\t' : '\\' + delim, 'g')) || []).length);
if (counts[0] === 0) continue;
const consistent = counts.every(c => c === counts[0]);
const score = counts[0] * (consistent ? 2 : 1);
if (score > bestScore) { bestScore = score; best = delim; }
}
return best;
}
Consistency matters more than raw count — "a,b,c" in one line and "d,e" in the next is suspicious, but "a,b,c" repeated every line is definitive.
Column type detection
Walk each column and check if all values match a type:
export function detectColumnType(values) {
const nonEmpty = values.filter(v => v !== '' && v != null);
if (nonEmpty.length === 0) return 'string';
if (nonEmpty.every(isValidNumber)) return 'number';
if (nonEmpty.every(isValidDate)) return 'date';
if (nonEmpty.every(isValidBoolean)) return 'boolean';
return 'string';
}
Order matters: check number before boolean, because "1" is a valid boolean but also a valid number — and "number" is the more useful classification.
Markdown table output
A properly-formatted markdown table has column widths padded for readability:
export function toMarkdown(rows, hasHeader = true) {
if (rows.length === 0) return '';
const widths = rows[0].map((_, colIdx) =>
Math.max(...rows.map(r => (r[colIdx] || '').length))
);
const lines = [];
const formatRow = (r) => '| ' + r.map((cell, i) => (cell || '').padEnd(widths[i])).join(' | ') + ' |';
lines.push(formatRow(rows[0]));
if (hasHeader) {
lines.push('|' + widths.map(w => '-'.repeat(w + 2)).join('|') + '|');
}
for (let i = 1; i < rows.length; i++) {
lines.push(formatRow(rows[i]));
}
return lines.join('\n');
}
Note the CJK caveat: padEnd counts UTF-16 code units, not display width. A monospace-rendered Japanese character is often 2 cells wide, so the output may look slightly misaligned in a terminal even though the Markdown is semantically correct.
Series
This is entry #54 in my 100+ public portfolio series.
- 📦 Repo: https://github.com/sen-ltd/csv-tool
- 🌐 Live: https://sen.ltd/portfolio/csv-tool/
- 🏢 Company: https://sen.ltd/

Top comments (0)