DEV Community

Tech Labs
Tech Labs

Posted on

Why Your Diff Tool is Failing on JSONL Files

The Problem

You're working on a 20,000-line JSONL (JSON Lines) dataset with carefully curated training data. You make changes, but need to verify what actually changed between versions.

The lines are too long. They don't fit on your screen. Each line is a dense unformatted JSON.

You reach for your favorite diff tool. And it fails.

Or worse—it shows you a meaningless blob of changes because it's treating your entire JSONL file as a single JSON document.

This shouldn't happen. But it does, constantly, to engineers and data engineers everywhere.

What is JSONL (and Why It Matters)

JSONL (JSON Lines) is deceptively simple: one valid JSON object per line.

{"id":1,"name":"Tom","age":35}
{"id":2,"name":"Maria","age":32}
{"id":3,"name":"Alex","age":28}
Enter fullscreen mode Exit fullscreen mode

It's not the same as pretty-printed JSON with newlines. Each line is completely independent. Parse it, process it, forget it. Next line.

This format is everywhere:

  • ChatGPT fine-tuning datasets (OpenAI's required format)
  • ML training pipelines (streaming data without loading everything into memory)
  • Structured logs (each log entry is a JSON object)

How It Works

  1. Parse each JSONL line independently (validates JSON syntax)
  2. Align by line number (line 1 vs. line 1, line 2 vs. line 2)
  3. Transform to pretty-printed JSON arrays (with 2-space indentation)
  4. Show side-by-side diff using Monaco Editor (VS Code's diff engine)

Result:

  • Readable JSON instead of compact one-liners
  • Clear visual diffs with syntax highlighting
  • Handles different lengths (pads with null)
  • Client-side only (your data never leaves your browser)
  • Drag & drop files or paste directly
  • Free, no signup, no tracking

Real-World Example

Let's say you're comparing two versions of a training dataset. Here's what you paste:

Left (original):

{"id":1,"name":"Tom","age":35,"score":92.5}
{"id":2,"name":"Maria","age":32,"score":88.3}
{"id":3,"name":"Alex","age":28,"score":95.1}
Enter fullscreen mode Exit fullscreen mode

Right (modified):

{"id":1,"name":"Tommy","age":35,"score":92.5}
{"id":2,"name":"Maria","age":33,"score":88.3}
Enter fullscreen mode Exit fullscreen mode

Notice line 3 is missing on the right, and there are changes in lines 1 and 2.

The tool shows:

  • Side-by-side pretty-printed JSON arrays
  • Line 1: "name": "Tom""name": "Tommy" (highlighted in red/green)
  • Line 2: "age": 32"age": 33 (highlighted)
  • Line 3: present on left, null on right (shows missing data)

No squinting. No character-by-character comparison. Just clear diffs.

Available here: https://www.jsonlify.com/compare-jsonlines

MachineLearning #DataEngineering #MLOps #JSONL #OpenAI #GPT #DataScience #WebDev

Top comments (0)