Why JSON.parse() Fails Silently on Truncated LLM Responses (And What I Did About It)

#ai #programming #llm #python

Why JSON.parse() Fails Silently on Truncated LLM Responses (And What I Did About It)

If you've shipped anything that asks an LLM to return JSON, you've already hit this bug. You just may not have noticed.

The LLM returns a response. Your code parses it. Most of the time it works. Sometimes it returns {} and you assume the LLM didn't find anything. The reality is darker: the JSON was truncated mid-object, your parser silently failed, and your downstream code is now operating on an empty dictionary instead of the partial result the LLM actually produced.

I lost six weeks to this bug. Here's what I learned.

The setup

I run code review with multiple LLMs in parallel. Each one returns a JSON array of issues found:


json
[
  {"file": "main.py", "line": 47, "type": "security", "severity": "high", "description": "..."},
  {"file": "main.py", "line": 89, "type": "smell", "severity": "low", "description": "..."}
]

When the LLM hits its max_tokens limit mid-response, the response gets cut off. You receive something like:
json

[
  {"file": "main.py", "line": 47, "type": "security", "severity": "high", "description": "..."},
  {"file": "main.py", "line": 89, "type": "smell", "seve

json.loads() raises JSONDecodeError. Most code catches the exception and returns []. The issues that WERE successfully parsed before the truncation are lost.
The dumb solution that actually works

You don’t need a streaming JSON parser. You need a bracket-counting repair function:
python

def guard_truncation(text: str, provider_id: str, file_path: str) -> str:
    stripped = text.strip()
    if not stripped.startswith("["):
        return text

    try:
        json.loads(stripped)
        return text  # already valid
    except json.JSONDecodeError:
        pass

    # find last complete object
    last_close = stripped.rfind("}")
    if last_close == -1:
        return "[]"

    # rebuild a valid array from the last complete object backward
    repaired = stripped[: last_close + 1] + "\n]"
    try:
        json.loads(repaired)
        return repaired
    except json.JSONDecodeError:
        return "[]"

It’s not elegant. It works. You recover 80-90% of the partial result instead of 0%.
The second bug that this revealed

Here’s where it gets worse.

My downstream code assumed every entry in the parsed list was a dictionary. Most of the time it was. But occasionally an LLM would return a string entry in the middle of the array:
json

[
  {"file": "main.py", "line": 47, ...},
  "I noticed there might be an issue here but I'm not sure",
  {"file": "main.py", "line": 89, ...}
]

My code did entry.get("file") on every entry. When it hit the string, AttributeError: 'str' object has no attribute 'get'. The exception was caught by a try/except too wide to be useful. The entire scan silently produced empty results for that file.

Six weeks. No error log. The only signal was “the report has fewer issues than usual for this codebase”.

The fix:
python

for entry in raw_issues:
    if not isinstance(entry, dict):
        continue
    # safe to call entry.get(...) here

Three lines. That’s it.
The bigger lesson

I don’t think LLM output should ever be trusted to match a schema. Even when you tell it “return valid JSON only”, you’ll get:

    Truncated JSON when you hit token limits
    Strings injected mid-array as informal commentary
    Wrong types in correct keys (line: "approximately 50" instead of line: 50)
    Extra keys not in your schema
    Missing required keys

The temptation is to use Pydantic or a JSON schema validator and reject malformed responses entirely. That’s the worst possible choice — you lose all the partial work the LLM did. The better choice is to repair what you can, type-check defensively at every step, and log what you couldn’t recover so you can iterate.

Three patterns that have saved me from similar bugs:

    Always isinstance(x, dict) before .get() on LLM-derived data. Always.
    Bracket-repair truncated JSON before declaring failure. 80% recovery beats 0%.
    Log what you discarded. If you silently filter bad entries, you’ll never know how often it happens. I now log every malformed entry with the provider name and file path.

Why this matters in 2026

Most teams treat LLM output as “either it works or it doesn’t”. The reality is closer to “it partially works most of the time, and the partial-failure modes are silent”. Production code that runs LLM output needs to be more paranoid than production code that talks to a normal API, because LLMs don’t have HTTP status codes — they have a single channel that mixes intent, format, and content.

I built my entire scanning workflow around the assumption that any single LLM response will be 5-10% broken. That assumption has been a better friend than any prompt engineering trick.

What’s your experience? Anyone else burned by silent truncation, or am I the last one to notice?

Top comments (2)

Harjot Singh • May 31

Truncated LLM responses breaking JSON.parse is one of those small, unglamorous failures that quietly cause big problems, because it sits right at the trust boundary between the model and your code. The root issue is that an LLM response is untrusted input, it can be cut off by a token limit, a timeout, or a stream that dropped, so treating its output as guaranteed-valid JSON is the same mistake as trusting any external input without validation. The fix-it-about part is the real lesson: validate and handle the malformed case explicitly (detect truncation, retry, repair, or fail loudly) rather than letting a silent parse failure propagate, because the worst version isn't the crash, it's the half-parsed object that flows downstream and corrupts something three steps later. The structural answer that prevents a whole class of this is constraining generation so the output can't be malformed in the first place (structured-output/grammar-constrained decoding where the provider supports it), plus a validation gate on everything the model returns. Treat model output as untrusted, validate at the boundary, and make truncation a loud, handled case. That never-trust-the-model's-output-shape instinct is core to how I think about reliability in Moonshift. Did you land on retry-on-invalid, JSON repair, or constrained/structured output to kill the truncation problem at the source?

NEXADiag Nexa • Jun 1

Thanks for the feedback!
You're absolutely right that the bracket-matching approach is only the first layer.
Since writing that article, I've evolved the pipeline into a 3-stage process:
syntactic repair → fallback extraction → schema validation
I've actually integrated this entire workflow into the v1.6.0 release that I'm shipping today. The goal is to tolerate truncated or partially malformed responses across 8 different providers simultaneously without breaking the consensus layer.
I'm curious whether you've run into similar issues with specific models or providers. In my testing, some high-throughput setups (for example Llama-based models served through Groq) seem more likely to expose truncation and partial-output edge cases simply because of the volume and speed of requests.
Have you found structured outputs alone sufficient, or do you still keep repair/recovery logic as a safety net?