DEV Community

sonka
sonka

Posted on

How I Built a JSON Repair Library for LLM Streaming (and Made it 1.7x Faster)

When you stream responses from OpenAI or Anthropic, JSON often arrives incomplete:

{"message": "I'm currently generating your resp
Enter fullscreen mode Exit fullscreen mode

JSON.parse() throws. Your app crashes. Users see errors.

I ran into this problem repeatedly while building AI features, so I wrote a library to fix it: repair-json-stream.

The Problem

LLM APIs stream tokens one at a time. If you're expecting JSON, you have two choices:

  1. Wait until the entire response is complete (slow, defeats the purpose of streaming)
  2. Parse incrementally and handle broken JSON (hard)

Existing solutions like jsonrepair work, but they're designed for batch processing, not streaming chunks.

The Solution

I built a single-pass state machine that:

  • Repairs truncated strings and unclosed brackets
  • Completes partial literals (trutrue, falsfalse, nulnull)
  • Handles Python-style constants (None, True, False)
  • Strips LLM "chatter" like "Here's your JSON:" and thinking blocks
  • Works with Web Streams API (Deno, Bun, Cloudflare Workers)
import { repairJson } from 'repair-json-stream'

const broken = '{"users": [{"name": "Alice'
const fixed = repairJson(broken)
// → '{"users": [{"name": "Alice"}]}'

JSON.parse(fixed) // Works!
Enter fullscreen mode Exit fullscreen mode

Architecture

The parser uses a stack-based state machine with O(n) single-pass processing:

  • No regex - Avoids ReDoS vulnerabilities
  • Character classification bitmask - O(1) lookups for whitespace, quotes, digits
  • Minimal allocations - Reuses buffers where possible

Key components:

Input → Preprocessor → State Machine → Output
         ↓                ↓
    Strip wrappers    Track: inString, 
    (JSONP, markdown)  escaped, stack depth
Enter fullscreen mode Exit fullscreen mode

What It Fixes

Issue Input Output
Truncated strings {"text": "Hello {"text": "Hello"}
Missing brackets {"a": [1, 2 {"a": [1, 2]}
Unquoted keys {name: "John"} {"name": "John"}
Single quotes {'key': 'val'} {"key": "val"}
Python constants {"x": None} {"x": null}
Trailing commas [1, 2, 3,] [1, 2, 3]
Comments {"a": 1} // note {"a": 1}
JSONP wrappers callback({"a": 1}) {"a": 1}
MongoDB types NumberLong(123) 123
Thinking blocks <thought>...</thought>{"a":1} {"a":1}

Advanced Features

Incremental Mode

For real-time UI updates, use the stateful incremental repairer:

import { IncrementalJsonRepair } from 'repair-json-stream/incremental'

const repairer = new IncrementalJsonRepair()

// As chunks arrive from LLM...
let output = ''
for await (const chunk of llmStream) {
  output += repairer.push(chunk)
  updateUI(output) // Live update!
}
output += repairer.end()
Enter fullscreen mode Exit fullscreen mode

LLM Garbage Extraction

Strip prose and extract JSON from messy LLM outputs:

import { extractJson } from 'repair-json-stream/extract'

const messy = 'Sure! Here is the data: {"name": "John"} Hope this helps!'
const clean = extractJson(messy)
// → '{"name": "John"}'
Enter fullscreen mode Exit fullscreen mode

Web Streams API

Works natively with TransformStream for edge runtimes:

import { jsonRepairStream } from 'repair-json-stream/web-stream'

const response = await fetch('/api/llm')
const repaired = response.body
  .pipeThrough(new TextDecoderStream())
  .pipeThrough(jsonRepairStream())
Enter fullscreen mode Exit fullscreen mode

Performance

Benchmarked against jsonrepair on Node.js 22:

Scenario repair-json-stream jsonrepair Speedup
Small (15KB) 1.16ms 3.10ms 2.7x
Large (3.9MB) 306ms 400ms 1.3x
Streaming (1K chunks) 371ms 638ms 1.7x

The streaming speedup comes from avoiding repeated full-document parsing.

Stats

  • Zero dependencies
  • 7KB minified
  • 97 tests (including property-based testing with fast-check)
  • TypeScript-first with full type definitions
  • Works in Node.js, Deno, Bun, browsers, Cloudflare Workers

Links

Feedback and contributions welcome!

Top comments (0)