DEV Community

Cover image for JSONL Explained: The Line-by-Line Format Powering AI Datasets
Tahmid
Tahmid

Posted on

JSONL Explained: The Line-by-Line Format Powering AI Datasets

You're trying to load a 500,000-record dataset into your script. You reach for JSON — it's universal, readable, everyone knows it. But the moment you call JSON.parse() on a 2 GB file, your process runs out of memory and crashes.

This is the problem JSONL (JSON Lines) was built to solve. And if you're working with AI training data, log pipelines, or any large-scale data processing, understanding JSONL will save you from real production pain.

What Is JSONL?

JSONL (also written .jsonl or called "JSON Lines") is a text format where each line is a self-contained, valid JSON object. There's no wrapping array, no commas between records — just one JSON object per line, separated by newlines.

Here's the key distinction:

Standard JSON array:

[
  {"id": 1, "name": "Alice", "role": "admin"},
  {"id": 2, "name": "Bob", "role": "editor"},
  {"id": 3, "name": "Carol", "role": "viewer"}
]
Enter fullscreen mode Exit fullscreen mode

JSONL equivalent:

{"id": 1, "name": "Alice", "role": "admin"}
{"id": 2, "name": "Bob", "role": "editor"}
{"id": 3, "name": "Carol", "role": "viewer"}
Enter fullscreen mode Exit fullscreen mode

The difference looks small. The impact at scale is enormous.

With a JSON array, the entire file must be parsed into memory before you can read a single record. With JSONL, you can stream the file one line at a time — processing millions of records with constant memory usage.

Processing JSONL Line by Line

Here's the practical difference in Node.js. First, the approach that breaks on large files:

// ❌ Loads the entire file into memory before processing a single record
const data = JSON.parse(fs.readFileSync('users.json', 'utf8'));
data.forEach(record => process(record));
Enter fullscreen mode Exit fullscreen mode

Now the JSONL equivalent, which handles files of any size:

// ✅ Streams one line at a time — constant memory usage
import { createReadStream } from 'fs';
import { createInterface } from 'readline';

const rl = createInterface({
  input: createReadStream('users.jsonl'),
});

rl.on('line', (line) => {
  if (line.trim()) {
    const record = JSON.parse(line);
    process(record);
  }
});
Enter fullscreen mode Exit fullscreen mode

The second approach works equally well on a 1,000-record file and a 50-million-record file. That's the core value of JSONL.

Why AI and LLMs Love JSONL

If you've worked with OpenAI's fine-tuning API, you've already encountered JSONL. The required format for training data is a .jsonl file where each line is a conversation turn:

{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "Paris."}]}
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "How do I reverse a string in Python?"}, {"role": "assistant", "content": "You can use slicing: s[::-1]"}]}
Enter fullscreen mode Exit fullscreen mode

Each line = one training example. You can add or remove examples without touching any other line in the file. You can run wc -l on the file to instantly count your training examples. You can head -n 10 to preview the first 10 records. These are the small ergonomic wins that matter when you're curating thousands of examples.

Log aggregation systems like Elasticsearch, Datadog, and Loki also ingest JSONL natively — each log entry is a self-contained object that can be appended without locking the file.

Working With JSONL Files

When you have a messy JSONL file — maybe it came from an export, maybe lines got mangled in transit — validation can get frustrating fast. Standard JSON validators reject the entire file because, as a whole, it isn't valid JSON.

The JSONL Formatter on JSON Indenter handles this correctly: it validates and formats each line independently, highlights which specific lines have errors, and lets you fix them without losing the rest of the file. No sign-up, nothing leaves your browser.

If you need to inspect a single record in detail, paste it into the JSON Beautifier for a properly indented, readable view — especially handy for deeply nested objects crammed into a single JSONL line.

For production pipelines where you're writing JSONL to disk or streaming it over the wire, run each record through the JSON Minifier first. Stripping whitespace from each record before appending it to your .jsonl file keeps sizes lean and ingestion fast.

For a deeper look at edge cases — empty lines, Unicode handling, NDJSON compatibility — the JSONL Format Explained guide covers the full picture.

When to Use JSONL vs Regular JSON

JSONL wins when:

  • Your dataset has more than a few thousand records
  • You need to append records incrementally (logs, event streams, AI training data)
  • You're feeding data to an LLM fine-tuning job or a RAG pipeline
  • You want line-level git diffs that are actually readable

Stick with standard JSON when:

  • The data is a config file or small lookup table
  • You need deep, document-level structure where the whole object matters
  • You're calling an API that expects a JSON array in the request body

A Common Gotcha

Empty lines in JSONL files will cause JSON.parse('') to throw a SyntaxError. Always guard against them:

rl.on('line', (line) => {
  if (line.trim() === '') return; // skip blank lines
  const record = JSON.parse(line);
  // ...
});
Enter fullscreen mode Exit fullscreen mode

This bites people constantly with JSONL files exported from tools that append a trailing newline after the last record — which is nearly all of them.


Are you using JSONL in your stack — for logs, datasets, or AI pipelines? And have you run into any tools or patterns that make working with large JSONL files smoother? Drop a note in the comments.


Free tools used in this post:

  • JSONL Formatter — validates and formats each line in a JSONL file independently
  • JSON Beautifier — pretty-prints any JSON object with proper indentation
  • JSON Minifier — strips whitespace from JSON to shrink file sizes
  • All tools — client-side, no sign-up, nothing leaves your browser

Top comments (0)