Why JSON.parse() Fails Silently on Truncated LLM Responses (And What I Did About It)
If you've shipped anything that asks an LLM to return JSON, you've already hit this bug. You just may not have noticed.
The LLM returns a response. Your code parses it. Most of the time it works. Sometimes it returns {} and you assume the LLM didn't find anything. The reality is darker: the JSON was truncated mid-object, your parser silently failed, and your downstream code is now operating on an empty dictionary instead of the partial result the LLM actually produced.
I lost six weeks to this bug. Here's what I learned.
The setup
I run code review with multiple LLMs in parallel. Each one returns a JSON array of issues found:
json
[
{"file": "main.py", "line": 47, "type": "security", "severity": "high", "description": "..."},
{"file": "main.py", "line": 89, "type": "smell", "severity": "low", "description": "..."}
]
When the LLM hits its max_tokens limit mid-response, the response gets cut off. You receive something like:
json
[
{"file": "main.py", "line": 47, "type": "security", "severity": "high", "description": "..."},
{"file": "main.py", "line": 89, "type": "smell", "seve
json.loads() raises JSONDecodeError. Most code catches the exception and returns []. The issues that WERE successfully parsed before the truncation are lost.
The dumb solution that actually works
You don’t need a streaming JSON parser. You need a bracket-counting repair function:
python
def guard_truncation(text: str, provider_id: str, file_path: str) -> str:
stripped = text.strip()
if not stripped.startswith("["):
return text
try:
json.loads(stripped)
return text # already valid
except json.JSONDecodeError:
pass
# find last complete object
last_close = stripped.rfind("}")
if last_close == -1:
return "[]"
# rebuild a valid array from the last complete object backward
repaired = stripped[: last_close + 1] + "\n]"
try:
json.loads(repaired)
return repaired
except json.JSONDecodeError:
return "[]"
It’s not elegant. It works. You recover 80-90% of the partial result instead of 0%.
The second bug that this revealed
Here’s where it gets worse.
My downstream code assumed every entry in the parsed list was a dictionary. Most of the time it was. But occasionally an LLM would return a string entry in the middle of the array:
json
[
{"file": "main.py", "line": 47, ...},
"I noticed there might be an issue here but I'm not sure",
{"file": "main.py", "line": 89, ...}
]
My code did entry.get("file") on every entry. When it hit the string, AttributeError: 'str' object has no attribute 'get'. The exception was caught by a try/except too wide to be useful. The entire scan silently produced empty results for that file.
Six weeks. No error log. The only signal was “the report has fewer issues than usual for this codebase”.
The fix:
python
for entry in raw_issues:
if not isinstance(entry, dict):
continue
# safe to call entry.get(...) here
Three lines. That’s it.
The bigger lesson
I don’t think LLM output should ever be trusted to match a schema. Even when you tell it “return valid JSON only”, you’ll get:
Truncated JSON when you hit token limits
Strings injected mid-array as informal commentary
Wrong types in correct keys (line: "approximately 50" instead of line: 50)
Extra keys not in your schema
Missing required keys
The temptation is to use Pydantic or a JSON schema validator and reject malformed responses entirely. That’s the worst possible choice — you lose all the partial work the LLM did. The better choice is to repair what you can, type-check defensively at every step, and log what you couldn’t recover so you can iterate.
Three patterns that have saved me from similar bugs:
Always isinstance(x, dict) before .get() on LLM-derived data. Always.
Bracket-repair truncated JSON before declaring failure. 80% recovery beats 0%.
Log what you discarded. If you silently filter bad entries, you’ll never know how often it happens. I now log every malformed entry with the provider name and file path.
Why this matters in 2026
Most teams treat LLM output as “either it works or it doesn’t”. The reality is closer to “it partially works most of the time, and the partial-failure modes are silent”. Production code that runs LLM output needs to be more paranoid than production code that talks to a normal API, because LLMs don’t have HTTP status codes — they have a single channel that mixes intent, format, and content.
I built my entire scanning workflow around the assumption that any single LLM response will be 5-10% broken. That assumption has been a better friend than any prompt engineering trick.
What’s your experience? Anyone else burned by silent truncation, or am I the last one to notice?
Top comments (0)