DEV Community

The BookMaster
The BookMaster

Posted on

The JSON Parsing Problem That's Killing Your AI Agent Reliability

Every AI agent operator hits this wall: you prompt your LLM to return structured data, and it gives you something almost right—but not quite. Maybe it added a trailing comma. Maybe it wrapped a string in quotes when you needed a number. Whatever the specific failure mode, your agent pipeline breaks at the parse step.

I've been building AI agent tools for the past year, and this was my single biggest source of runtime failures. Here's the pattern I landed on that solved it permanently.

The Problem with Pure Prompting

You can spend hours crafting the perfect system prompt. You can add examples. You can beg the model to "always return valid JSON". But probabilistic outputs mean probabilistic results—eventually you'll get back something that breaks your parser.

The real fix isn't a better prompt. It's moving the contract enforcement out of the prompt and into your code.

The Solution: Schema-Guided Extraction

Instead of asking the LLM to output JSON directly, ask it to output a plain text description, then parse that into your schema. Here's the pattern:

import json
from typing import Type
from pydantic import BaseModel, ValidationError

def extract_structured(model_output: str, schema: Type[BaseModel]) -> BaseModel | None:
    """Try to parse model output into the target schema."""
    try:
        data = json.loads(model_output)
        return schema(**data)
    except (json.JSONDecodeError, ValidationError):
        return None

def extract_with_fallback(model_output: str, schema: Type[BaseModel]) -> BaseModel:
    """Attempt extraction, raise with context on failure."""
    result = extract_structured(model_output, schema)
    if result is not None:
        return result

    # Fallback: try stripping markdown code blocks
    cleaned = model_output.strip()
    if cleaned.startswith('```

'):
        cleaned = '\n'.join(cleaned.split('\n')[1:])  # Remove first line
        cleaned = cleaned.removesuffix('

```').strip()

    result = extract_structured(cleaned, schema)
    if result is not None:
        return result

    raise ValueError(f"Could not parse output into {schema.__name__}")
Enter fullscreen mode Exit fullscreen mode

The key insight: your extraction layer handles all the edge cases (trailing commas, markdown wrapping, quote style variations). Your prompt just asks for the data—it doesn't need to worry about format.

Apply This to Your Agent Pipeline

from pydantic import BaseModel

class ArticleMetadata(BaseModel):
    title: str
    tags: list[str]
    reading_time_minutes: int

def parse_article_suggestion(raw_output: str) -> ArticleMetadata:
    return extract_with_fallback(raw_output, ArticleMetadata)

# In your agent loop:
response = llm.complete(prompt)
metadata = parse_article_suggestion(response)
# metadata is guaranteed to be valid
Enter fullscreen mode Exit fullscreen mode

This pattern has eliminated parse failures in my agent pipelines. The LLM focuses on understanding. The extraction layer focuses on correctness. Each layer has one job.


Full catalog of my AI agent tools at https://thebookmaster.zo.space/bolt/market

Top comments (0)