Arnold Cartagena

Posted on Feb 3

Why Your LLM Returns “Sure! Here’s the JSON” and How to Fix It

You ask for JSON. The LLM returns:

Sure! Here's the JSON you requested:
{"name": "test", "value": 42}
Let me know if you need anything else!

Your parser crashes. Your RAG/Agentic pipeline fails (or worse: gets swallowed behind a generic infinite retry handler). You add more prompt engineering. It works 90% of the time. The other 10%? You're debugging infinitely wondering which of your 12 nodes broke. You didn't want those "Sure! Here's the JSON you requested" or the "Let me know if you need anything else!", I just wanted json.

I had this a lot when trying to get a consistent output of LLMs.

I thought this was just me.

The Pattern

Most teams shipping/testing LLM features run into some version of this:

You ask for JSON, you get "Sure! Here's the JSON you requested:"
The JSON has trailing commas, single quotes, or gets truncated
json.loads() fails with "line 1 column 47" — low-context at best
You retry, but the LLM makes the same mistake
You add prompt engineering. It works 90% of the time. The other 10%...

This prompt engineering part is really a pain to do, multiple versions of the prompt when the problem can actually be solved in other ways.

Search any LLM framework's issues for "JSON" or "ValidationError". The problem shows up across models and frameworks. The solutions are scattered across docs, GitHub issues, and custom workarounds.

There are really two failures here: parsing (turning text into JSON) and validation (ensuring the JSON matches what your pipeline expects). handoff-guard handles both, plus retries with feedback.

Why LLMs Do This

LLMs are trained to be helpful. When you ask for JSON, they want to:

Acknowledge your request ("Sure!")
Explain what they're giving you ("Here's the JSON:")
Format it nicely (markdown code blocks)
Offer follow-up help ("Let me know if...")

This is great for chat. It's terrible for parsing.

And it gets worse:

Truncation: Hit the token limit? Your JSON ends mid-string: {"draft": "This is a long article about...
Malformed syntax: Trailing commas, single quotes, unquoted keys. All common LLM outputs
Nested code blocks: JSON containing ``` characters breaks regex-based parsers

Common Approaches (and Their Tradeoffs)

"Just use JSON mode" — JSON/structured-output modes help when available, but they guarantee syntax, not schema. You still get validation errors, truncation, and no framework-level context like "which node failed."

"Use OutputFixingParser" — LangChain's output-fixing pattern repairs by calling the LLM again—adding latency and cost for every error. Its recommended usage has also shifted across LangChain versions.

"Use Instructor" — Powerful for structured generation across many providers. When it fixes errors, it usually does so by re-prompting the LLM. If you want fast, local repair without burning more tokens, you need a post-processor.

"Use Outlines" — Great for constrained decoding, but requires control over the inference server (e.g., vLLM). It doesn't help if you're calling a closed API like OpenAI or Anthropic.

"Add more prompt engineering" — You're playing whack-a-mole. Fix one edge case, another appears.

What I Built Instead

I needed something that:

Works with raw text output from any provider (post-hoc, not constrained generation)
Identifies which node failed (not just "validation error")
Retries with feedback (tells the LLM what went wrong)
Repairs common syntax issues locally (without calling the LLM again)
Stays lightweight (no embeddings, no ML, just parsing)

So I built handoff-guard.

Before


python
def writer_agent(state: dict) -> dict:
    response = call_llm("Return JSON with: draft, word_count, tone")

    # Hope it's valid JSON
    try:
        data = json.loads(response)
    except json.JSONDecodeError:
        # Which node? What failed? Can the agent retry?
        raise

    # Hope it matches the schema
    try:
        validated = WriterOutput(**data)
    except ValidationError:
        # "1 validation error for WriterOutput" — thanks for nothing
        raise

    return data

After


python
from handoff import guard, retry, parse_json  # PyPI: handoff-guard
from pydantic import BaseModel, Field

class WriterOutput(BaseModel):
    draft: str = Field(min_length=100)
    word_count: int = Field(ge=50)
    tone: str

@guard(output=WriterOutput, node_name="writer", max_attempts=3)
def writer_agent(state: dict) -> dict:
    prompt = "Return JSON with: draft, word_count, tone"

    if retry.is_retry:
        prompt += f"\n\nPrevious attempt failed:\n{retry.feedback()}"

    response = call_llm(prompt)
    return parse_json(response)  # Strips wrappers, repairs syntax

If it fails after 3 attempts:



HandoffViolation in 'writer':
  Contract: output
  Field: draft
  Expected: String should have at least 100 characters
  Received: 'Too short...' (str)
  Suggestion: Increase the length of 'draft'

For logs/telemetry, access e.total_attempts, e.history, or e.to_dict().

What `parse_json` Actually Does


python
from handoff import parse_json

# Strips conversational wrappers
obj = parse_json('Sure! Here\'s the JSON:\n{"key": "value"}\nLet me know!')
# -> Python dict/list (parsed JSON), not a JSON string

# Handles common syntax issues (via json-repair)
parse_json('{"a": 1,}')        # trailing comma → {"a": 1}
parse_json("{'a': 1}")         # single quotes → {"a": 1}
parse_json('{a: 1}')           # unquoted keys → {"a": 1}
parse_json('{"a": 1 // comment}')  # JS comments → {"a": 1}

# Detects truncation (v0.2.1)
result = parse_json('{"draft": "long text...', detailed=True)
# -> ParseResult with .data (dict), .truncated (bool), .repaired (bool)
result.truncated  # True — best-effort signal (unmatched braces detected)
result.repaired   # True — json-repair path was used successfully

No LLM calls. No embeddings. Deterministic parsing with best-effort repair. I haven't published benchmarks; this was built from real failure modes in my own graphs.

Why Not Instructor/Outlines?

	Instructor	Outlines	handoff-guard
Approach	Generation-time validation	Constrained generation	Post-hoc validation & repair
Works with	OpenAI, Anthropic, etc.	vLLM, Transformers	Any string output
LangGraph compatible	Yes (manual)	No	Yes (adapter: `guarded_node`)
Identifies failed node	No	N/A	Yes
Retries with feedback	Yes	N/A	Yes
Repairs malformed JSON	Yes (via re-prompt)	N/A	Yes (local, no tokens)
Dependencies	Pydantic + provider SDKs	Transformers/vLLM stack	Pydantic + json-repair

Instructor and Outlines are excellent tools. The difference is when and how they work:

Instructor validates at generation time and fixes errors by re-prompting—effective but costs tokens
Outlines constrains generation at the model level—powerful but requires inference server control
handoff-guard validates after the LLM responds and repairs locally—no extra tokens, works with any provider

The Problems This Actually Solves

handoff-guard doesn't fix framework bugs. It helps when you control the code that receives LLM output:

Problem	Example	How handoff-guard helps
LLM wraps JSON in conversation	`"Sure! Here's the JSON: {...}"`	`parse_json()` strips wrappers
Malformed JSON syntax	Trailing commas, single quotes, unquoted keys	`parse_json()` repairs common issues
Truncated output at token limit	`{"draft": "long text...`	`parse_json(detailed=True)` detects truncation
"ValidationError" with no context	`1 validation error for State`	`@guard(node_name="writer")` tells you which node
No retry on validation failure	Agent fails once, stays failed	`@guard(max_attempts=3)` retries automatically
LLM doesn't know why it failed	Retry happens but same error repeats	`retry.feedback()` tells the LLM what went wrong

Limits

What this won't magically fix:

Missing or hallucinated data — If the model omits required fields or invents values, deterministic repair can't invent correct data. Retries are still needed.
Ambiguous repairs — "Repair" is sometimes a best-effort guess (e.g., unquoted keys, stray punctuation). Always validate the result.
Severe truncation — You can detect it, but you can't recover missing content without another generation.
Adversarial or multi-JSON outputs — parse_json extracts the first JSON object/array boundary it finds. Complex tool traces or multiple embedded objects may need custom handling.

Security note: If you're parsing untrusted model output, treat "repaired JSON" as untrusted input. Validate types and ranges.

Get Started


bash
pip install handoff-guard

The package is handoff-guard, the import namespace is handoff:


python
from handoff import guard, retry, parse_json

That's it. No config files. No API keys. No Docker.

GitHub: github.com/acartag7/handoff-guard
PyPI: pypi.org/project/handoff-guard

What's Next

The library does what it set out to do. I'm not planning major features just bug fixes and edge cases as users report them, as it actually works for my current need.

If you hit something it doesn't handle, open an issue.

Built because "ValidationError: 1 validation error" tells you nothing useful.

DEV Community

Why Your LLM Returns “Sure! Here’s the JSON” and How to Fix It

The Pattern

Why LLMs Do This

Common Approaches (and Their Tradeoffs)

What I Built Instead

Before

After

What `parse_json` Actually Does

Why Not Instructor/Outlines?

The Problems This Actually Solves

Limits

Get Started

What's Next

Top comments (0)

The Pattern

Why LLMs Do This

Common Approaches (and Their Tradeoffs)

What I Built Instead

Before

After

What parse_json Actually Does

Why Not Instructor/Outlines?

The Problems This Actually Solves

Limits

Get Started

What's Next

What `parse_json` Actually Does