DEV Community

Michael Lip
Michael Lip

Posted on • Originally published at zovo.one

Converting CSV to JSON: The Edge Cases That Break Naive Implementations

Converting CSV to JSON seems like a trivial task. Split by newlines, split by commas, use the first row as keys. Ten lines of code. Ship it.

Then you encounter a value with a comma inside quotes, a field with an embedded newline, a header with spaces, or a file with inconsistent quoting. Your ten-line solution breaks, and you learn why proper CSV parsing is a solved but non-trivial problem.

The CSV specification

RFC 4180 defines the CSV format. The rules that trip people up:

Fields containing commas must be quoted: "New York, NY" is one field, not two.

Fields containing double quotes must be escaped: The value He said "hello" is encoded as "He said ""hello""". Double quotes inside a quoted field are escaped by doubling them.

Fields containing newlines must be quoted: A single field can span multiple lines if it is quoted. This breaks every implementation that splits on newlines first.

The header row is optional: Some CSV files have headers, some do not. There is no reliable way to auto-detect this.

Trailing commas: A row ending with a comma has an empty last field. a,b,c, has four fields, the last one empty.

The naive implementation and its failures

// DO NOT USE THIS
function csvToJson(csv) {
  const lines = csv.split('\n');
  const headers = lines[0].split(',');
  return lines.slice(1).map(line => {
    const values = line.split(',');
    return headers.reduce((obj, h, i) => {
      obj[h.trim()] = values[i]?.trim();
      return obj;
    }, {});
  });
}
Enter fullscreen mode Exit fullscreen mode

This fails on: quoted fields with commas, quoted fields with newlines, escaped quotes, empty fields, fields with leading/trailing whitespace that should be preserved, and Windows-style line endings (CRLF).

A proper implementation approach

A correct CSV parser is a state machine. It reads character by character and tracks whether it is inside a quoted field or not:

function parseCSV(text) {
  const rows = [];
  let row = [];
  let field = '';
  let inQuotes = false;

  for (let i = 0; i < text.length; i++) {
    const char = text[i];
    const next = text[i + 1];

    if (inQuotes) {
      if (char === '"' && next === '"') {
        field += '"';
        i++; // skip escaped quote
      } else if (char === '"') {
        inQuotes = false;
      } else {
        field += char;
      }
    } else {
      if (char === '"') {
        inQuotes = true;
      } else if (char === ',') {
        row.push(field);
        field = '';
      } else if (char === '\n' || (char === '\r' && next === '\n')) {
        row.push(field);
        rows.push(row);
        row = [];
        field = '';
        if (char === '\r') i++;
      } else {
        field += char;
      }
    }
  }
  // Last field and row
  row.push(field);
  if (row.length > 1 || field) rows.push(row);

  return rows;
}
Enter fullscreen mode Exit fullscreen mode

This handles quoted commas, escaped quotes, multiline fields, and CRLF line endings.

JSON structure decisions

When converting to JSON, you need to decide on the output structure:

Array of objects (most common): Each row becomes an object keyed by header names.

[{"name": "Alice", "age": "30"}, {"name": "Bob", "age": "25"}]
Enter fullscreen mode Exit fullscreen mode

Array of arrays: Preserves the tabular structure without header mapping.

[["name", "age"], ["Alice", "30"], ["Bob", "25"]]
Enter fullscreen mode Exit fullscreen mode

Nested objects: If headers use dot notation or a separator, you can nest.

[{"name": "Alice", "address": {"city": "NYC", "state": "NY"}}]
Enter fullscreen mode Exit fullscreen mode

Keyed by a column: Using a specific column as the object key for O(1) lookup.

{"alice": {"name": "Alice", "age": "30"}, "bob": {"name": "Bob", "age": "25"}}
Enter fullscreen mode Exit fullscreen mode

Type inference

CSV is inherently untyped -- everything is a string. When converting to JSON, you may want to infer types: numbers, booleans, dates, nulls. But automatic inference is risky (remember Excel's zip code problem).

The safe approach is to convert everything as strings and let the consumer handle type conversion with explicit rules.

The tool

For quick conversions with proper parsing, I built a CSV to JSON converter that handles all the RFC 4180 edge cases, supports multiple output formats, and does not silently corrupt your data.


I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.

Top comments (0)