mtdevworks

Posted on Feb 7

5 Ways LLMs Break JSON in Production (And How to Fix It)

#ai #llm #json #developer

You've wired up GPT function calling or hooked LangChain into your app. Everything works in testing - until you deploy. Suddenly, you're seeing Unexpected token in JSON at position 42, or your schema validator rejects half the responses. Sound familiar?

LLMs are great at meaning, but they’re surprisingly bad at syntax. Training data is full of inconsistent JSON, and models often mix it with JavaScript, YAML, or plain prose. The result is broken JSON that breaks your app.

Here are the five most common ways LLMs break JSON—and practical ways to fix them, including an API that handles all of these automatically.

1. Trailing commas

What you get:

{"name": "John", "age": 30,}

That comma after 30 is invalid in JSON. In JavaScript it’s fine; in JSON it’s not. JSON.parse() throws.

Why it happens:

Models see both valid JSON and JavaScript in training data. They don’t always distinguish. Trailing commas also appear in arrays: [1, 2, 3,].

How to fix:

Strip trailing commas before the closing } or ], or run the string through a repair step that normalizes this. If you use a validation layer, choose one that can repair as well as validate—so you get valid JSON out instead of just an error.

2. Unquoted keys

What you get:

{name: "John", age: 30}

Valid in JavaScript; invalid in JSON. Keys must be double-quoted strings.

Why it happens:

LLMs are heavily trained on JavaScript/TypeScript. Object literals with unquoted keys are everywhere. The model reproduces that style.

How to fix:

A repair step can wrap unquoted keys in double quotes: name → "name". Regex can handle simple cases; for nested structures and edge cases, a dedicated parser/repairer is safer.

3. Missing required fields

What you get:

Your schema says name and age are required. The LLM returns:

{"name": "John"}

No age. Your validator fails, and your app doesn’t know whether to retry, default, or show an error.

Why it happens:

Context limits, vague instructions, or the model “forgetting” part of the schema. It’s a semantic/schema problem, not just syntax.

How to fix:

Two approaches: (1) Strict validation — reject and retry or show a clear error. (2) Enforcement — fix what you can (e.g. repair syntax), then enforce the schema with defaults for missing required fields. Enforcement is useful when you’d rather have a best-effort result than a hard failure.

4. Mixed or single quotes

What you get:

{'name': 'John'}

{"name": 'John'}

JSON allows only double quotes for strings and keys. Single quotes are invalid.

Why it happens:

Training data includes Python dicts, shell-style strings, and other formats. The model mixes quote styles.

How to fix:

Normalize to double quotes. Be careful with apostrophes inside strings (e.g. "John's car") so you don’t break them when converting. A repair layer that understands string boundaries handles this correctly.

5. JSON buried in prose or markdown

What you get:

Sure! Here's the data you asked for:

'```

json
{"name": "John", "age": 30}


```'

Hope that helps!

Your code expects a raw JSON string. Instead you get a paragraph, markdown fences, and maybe extra text before or after. JSON.parse() on the whole thing fails.

Why it happens:

LLMs are conversational. They explain, wrap code in markdown, and add pleasantries. That’s helpful for readability and terrible for parsing.

How to fix:

Extract the JSON first: strip markdown code fences, find the first { or [, then parse to the matching } or ], or use a dedicated “extract JSON from prose” step. Only then validate or repair. Doing extraction before validation keeps your pipeline robust.

A single layer that handles all five

Fixing each of these by hand (regex, custom parsers, retry logic) gets messy fast. A cleaner approach is to put a small validation-and-repair layer between your LLM and your app.

JSON Guardian is an API built for exactly this: it validates, repairs, and enforces JSON from LLM outputs in under 10ms.

Trailing commas / unquoted keys / single quotes → POST /api/v1/repair returns valid JSON.
Missing required fields → POST /api/v1/enforce with your JSON Schema repairs syntax, then enforces the schema (including defaults for missing required fields when applicable).
JSON in prose or markdown → POST /api/v1/extract strips fences and surrounding text and returns the extracted JSON.

You send the raw LLM response; you get back something you can safely parse and pass to the rest of your app. Built in Rust, so latency stays low—important when you’re calling it on every LLM response.

Quick example — repair:

curl -X POST https://api.jsonguardian.com/api/v1/repair \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"data": "{\"name\": \"John\", \"age\": 30,}"}'

Response includes a repaired string and repaired_data object—ready to use.

Quick example — extract from markdown:

curl -X POST https://api.jsonguardian.com/api/v1/extract \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"data": "Here is the result: ```

json\n{\"name\": \"John\"}\n

``` Hope that helps!"}'

You get extracted and extracted_data without the surrounding text.

Free tier: 10,000 requests/month. No credit card required. You can try it at jsonguardian.com or via RapidAPI.

Summary

Handling these in one place—between the LLM and your business logic—keeps your app stable and your code simple. If you’re tired of debugging JSON parse errors in production, give a validation layer a try.

What JSON issues are you running into with your LLM projects? I’d love to hear in the comments.

Try JSON Guardian: jsonguardian.com · RapidAPI · Free tier: 10k requests/month

Top comments (1)

mtdevworks • Feb 17

This really resonates. One of the most frustrating parts of working with LLMs in production is how unpredictable JSON output can be — especially when the structure slightly deviates and breaks downstream logic.
I’ve personally run into issues like missing fields, type mismatches, and occasionally completely malformed responses even with strict prompting. It becomes less of an AI problem and more of a reliability engineering problem.
Having a dedicated validation and correction layer between the LLM and business logic makes a lot of sense. It not only improves stability but also reduces the need for defensive coding everywhere else in the stack.
Curious to know — does JSON Guardian also handle schema evolution gracefully when the expected structure changes over time?