DEV Community

SEN LLC
SEN LLC

Posted on

A YAML ↔ JSON Converter With a Handwritten YAML Subset Parser

A YAML ↔ JSON Converter With a Handwritten YAML Subset Parser

The full YAML 1.2 spec is notoriously complex — more than JSON, more than XML. But 90% of real YAML in the wild (configs, K8s manifests, GitHub Actions) uses a small subset: key-value maps, lists, quoted strings, block scalars. Implementing that subset takes ~300 lines and handles everything a developer actually needs.

YAML looks like "JSON with less punctuation" but the spec has surprises: the "Norway problem" (no parses as boolean), indentation sensitivity, anchor/alias references, multiple document separators, 8 different string quoting styles. Writing a full parser is a project. Writing a useful subset is a weekend.

🔗 Live demo: https://sen.ltd/portfolio/yaml-json/
📦 GitHub: https://github.com/sen-ltd/yaml-json

Screenshot

Features:

  • YAML → JSON and JSON → YAML
  • Handwritten YAML subset parser
  • Live bidirectional conversion
  • Error display with line numbers
  • 4 built-in examples (K8s, GitHub Actions, config, nested)
  • Japanese / English UI
  • Zero dependencies, 74 tests

The indentation-based parser

YAML uses indentation to express structure. A recursive parser that tracks the "current indent level" and groups child lines works well:

export function parseYaml(text) {
  const lines = text.split('\n').filter(line => !isCommentOrEmpty(line));
  let i = 0;

  function parseBlock(baseIndent) {
    const node = {};
    let isList = false;
    while (i < lines.length) {
      const line = lines[i];
      const indent = getIndent(line);
      if (indent < baseIndent) break;

      if (line.trimStart().startsWith('- ')) {
        // List item
        if (!isList) { isList = true; /* replace node with array */ }
        // ...
      } else {
        // Key-value
        const { key, value } = parseKeyValue(line);
        if (value === null) {
          // Nested block
          i++;
          node[key] = parseBlock(indent + 2);
        } else {
          node[key] = parseValue(value);
          i++;
        }
      }
    }
    return node;
  }

  return parseBlock(0);
}
Enter fullscreen mode Exit fullscreen mode

The tricky part is knowing when a key has a scalar value vs when it has a nested block. The heuristic: if the value after : is empty, the next indented lines form the block.

Type inference

YAML scalars are typed by pattern, not explicit annotation:

export function parseValue(str) {
  if (str === 'null' || str === '~' || str === '') return null;
  if (str === 'true' || str === 'yes' || str === 'on') return true;
  if (str === 'false' || str === 'no' || str === 'off') return false;
  if (/^-?\d+$/.test(str)) return parseInt(str, 10);
  if (/^-?\d+\.\d+$/.test(str)) return parseFloat(str);
  if (str.startsWith('"') && str.endsWith('"')) return unquoteDouble(str);
  if (str.startsWith("'") && str.endsWith("'")) return unquoteSingle(str);
  return str; // bare string
}
Enter fullscreen mode Exit fullscreen mode

The infamous "Norway problem": Norway's country code is NO, which parses as boolean false if unquoted. YAML 1.2 (the "core schema") doesn't treat no as boolean, but YAML 1.1 and most parsers still do. Writing country: NO in a YAML list of country codes gives you country: false after parsing. The fix is explicit quoting: country: "NO".

Block scalars

For multi-line strings, YAML has | (literal) and > (folded):

description: |
  This is a multi-line string.
  Line breaks are preserved exactly.

summary: >
  This is also multi-line, but
  line breaks become spaces.
Enter fullscreen mode Exit fullscreen mode

The parser detects | or > at the end of a key line, then reads indented lines until dedent:

if (value === '|' || value === '>') {
  const blockLines = [];
  i++;
  const blockIndent = getIndent(lines[i]);
  while (i < lines.length && (getIndent(lines[i]) >= blockIndent || lines[i].trim() === '')) {
    blockLines.push(lines[i].slice(blockIndent));
    i++;
  }
  node[key] = value === '|' ? blockLines.join('\n') : blockLines.join(' ');
}
Enter fullscreen mode Exit fullscreen mode

Chomping modifiers (|-, |+, >-, >+) control trailing newline behavior. The parser handles the common |- (strip) and | (clip) cases.

The writer

Going the other way is easier — just walk the object tree and emit indented lines:

export function toYaml(obj, indent = 0) {
  const pad = '  '.repeat(indent);
  if (Array.isArray(obj)) {
    return obj.map(item => `${pad}- ${inlineOrBlock(item, indent + 1)}`).join('\n');
  }
  if (typeof obj === 'object' && obj !== null) {
    return Object.entries(obj).map(([k, v]) => {
      if (typeof v === 'object') {
        return `${pad}${k}:\n${toYaml(v, indent + 1)}`;
      }
      return `${pad}${k}: ${quoteIfNeeded(v)}`;
    }).join('\n');
  }
  return String(obj);
}
Enter fullscreen mode Exit fullscreen mode

The quoteIfNeeded helper adds quotes when the value looks like it could be misinterpreted — strings that match boolean/null patterns, strings with colons, strings starting with -. This prevents round-trip data loss.

Series

This is entry #94 in my 100+ public portfolio series.

Top comments (0)