You ask for JSON. The LLM returns:
Sure! Here's the JSON you requested:
{"name": "test", "value": 42}Let me know if you need anything else!
Your parser crashes. Your RAG/Agentic pipeline fails (or worse: gets swallowed behind a generic infinite retry handler). You add more prompt engineering. It works 90% of the time. The other 10%? You're debugging infinitely wondering which of your 12 nodes broke. You didn't want those "Sure! Here's the JSON you requested" or the "Let me know if you need anything else!", I just wanted json.
I had this a lot when trying to get a consistent output of LLMs.
I thought this was just me.
The Pattern
Most teams shipping/testing LLM features run into some version of this:
- You ask for JSON, you get
"Sure! Here's the JSON you requested:" - The JSON has trailing commas, single quotes, or gets truncated
-
json.loads()fails with "line 1 column 47" — low-context at best - You retry, but the LLM makes the same mistake
- You add prompt engineering. It works 90% of the time. The other 10%...
This prompt engineering part is really a pain to do, multiple versions of the prompt when the problem can actually be solved in other ways.
Search any LLM framework's issues for "JSON" or "ValidationError". The problem shows up across models and frameworks. The solutions are scattered across docs, GitHub issues, and custom workarounds.
There are really two failures here: parsing (turning text into JSON) and validation (ensuring the JSON matches what your pipeline expects). handoff-guard handles both, plus retries with feedback.
Why LLMs Do This
LLMs are trained to be helpful. When you ask for JSON, they want to:
- Acknowledge your request ("Sure!")
- Explain what they're giving you ("Here's the JSON:")
- Format it nicely (markdown code blocks)
- Offer follow-up help ("Let me know if...")
This is great for chat. It's terrible for parsing.
And it gets worse:
-
Truncation: Hit the token limit? Your JSON ends mid-string:
{"draft": "This is a long article about... - Malformed syntax: Trailing commas, single quotes, unquoted keys. All common LLM outputs
- Nested code blocks: JSON containing ``` characters breaks regex-based parsers
Common Approaches (and Their Tradeoffs)
"Just use JSON mode" — JSON/structured-output modes help when available, but they guarantee syntax, not schema. You still get validation errors, truncation, and no framework-level context like "which node failed."
"Use OutputFixingParser" — LangChain's output-fixing pattern repairs by calling the LLM again—adding latency and cost for every error. Its recommended usage has also shifted across LangChain versions.
"Use Instructor" — Powerful for structured generation across many providers. When it fixes errors, it usually does so by re-prompting the LLM. If you want fast, local repair without burning more tokens, you need a post-processor.
"Use Outlines" — Great for constrained decoding, but requires control over the inference server (e.g., vLLM). It doesn't help if you're calling a closed API like OpenAI or Anthropic.
"Add more prompt engineering" — You're playing whack-a-mole. Fix one edge case, another appears.
What I Built Instead
I needed something that:
- Works with raw text output from any provider (post-hoc, not constrained generation)
- Identifies which node failed (not just "validation error")
- Retries with feedback (tells the LLM what went wrong)
- Repairs common syntax issues locally (without calling the LLM again)
- Stays lightweight (no embeddings, no ML, just parsing)
So I built handoff-guard.
Before
python
def writer_agent(state: dict) -> dict:
response = call_llm("Return JSON with: draft, word_count, tone")
# Hope it's valid JSON
try:
data = json.loads(response)
except json.JSONDecodeError:
# Which node? What failed? Can the agent retry?
raise
# Hope it matches the schema
try:
validated = WriterOutput(**data)
except ValidationError:
# "1 validation error for WriterOutput" — thanks for nothing
raise
return data
After
python
from handoff import guard, retry, parse_json # PyPI: handoff-guard
from pydantic import BaseModel, Field
class WriterOutput(BaseModel):
draft: str = Field(min_length=100)
word_count: int = Field(ge=50)
tone: str
@guard(output=WriterOutput, node_name="writer", max_attempts=3)
def writer_agent(state: dict) -> dict:
prompt = "Return JSON with: draft, word_count, tone"
if retry.is_retry:
prompt += f"\n\nPrevious attempt failed:\n{retry.feedback()}"
response = call_llm(prompt)
return parse_json(response) # Strips wrappers, repairs syntax
If it fails after 3 attempts:
HandoffViolation in 'writer':
Contract: output
Field: draft
Expected: String should have at least 100 characters
Received: 'Too short...' (str)
Suggestion: Increase the length of 'draft'
For logs/telemetry, access e.total_attempts, e.history, or e.to_dict().
What parse_json Actually Does
python
from handoff import parse_json
# Strips conversational wrappers
obj = parse_json('Sure! Here\'s the JSON:\n{"key": "value"}\nLet me know!')
# -> Python dict/list (parsed JSON), not a JSON string
# Handles common syntax issues (via json-repair)
parse_json('{"a": 1,}') # trailing comma → {"a": 1}
parse_json("{'a': 1}") # single quotes → {"a": 1}
parse_json('{a: 1}') # unquoted keys → {"a": 1}
parse_json('{"a": 1 // comment}') # JS comments → {"a": 1}
# Detects truncation (v0.2.1)
result = parse_json('{"draft": "long text...', detailed=True)
# -> ParseResult with .data (dict), .truncated (bool), .repaired (bool)
result.truncated # True — best-effort signal (unmatched braces detected)
result.repaired # True — json-repair path was used successfully
No LLM calls. No embeddings. Deterministic parsing with best-effort repair. I haven't published benchmarks; this was built from real failure modes in my own graphs.
Why Not Instructor/Outlines?
| Instructor | Outlines | handoff-guard | |
|---|---|---|---|
| Approach | Generation-time validation | Constrained generation | Post-hoc validation & repair |
| Works with | OpenAI, Anthropic, etc. | vLLM, Transformers | Any string output |
| LangGraph compatible | Yes (manual) | No | Yes (adapter: guarded_node) |
| Identifies failed node | No | N/A | Yes |
| Retries with feedback | Yes | N/A | Yes |
| Repairs malformed JSON | Yes (via re-prompt) | N/A | Yes (local, no tokens) |
| Dependencies | Pydantic + provider SDKs | Transformers/vLLM stack | Pydantic + json-repair |
Instructor and Outlines are excellent tools. The difference is when and how they work:
- Instructor validates at generation time and fixes errors by re-prompting—effective but costs tokens
- Outlines constrains generation at the model level—powerful but requires inference server control
- handoff-guard validates after the LLM responds and repairs locally—no extra tokens, works with any provider
The Problems This Actually Solves
handoff-guard doesn't fix framework bugs. It helps when you control the code that receives LLM output:
| Problem | Example | How handoff-guard helps |
|---|---|---|
| LLM wraps JSON in conversation | "Sure! Here's the JSON: {...}" |
parse_json() strips wrappers |
| Malformed JSON syntax | Trailing commas, single quotes, unquoted keys |
parse_json() repairs common issues |
| Truncated output at token limit | {"draft": "long text... |
parse_json(detailed=True) detects truncation |
| "ValidationError" with no context | 1 validation error for State |
@guard(node_name="writer") tells you which node |
| No retry on validation failure | Agent fails once, stays failed |
@guard(max_attempts=3) retries automatically |
| LLM doesn't know why it failed | Retry happens but same error repeats |
retry.feedback() tells the LLM what went wrong |
Limits
What this won't magically fix:
- Missing or hallucinated data — If the model omits required fields or invents values, deterministic repair can't invent correct data. Retries are still needed.
- Ambiguous repairs — "Repair" is sometimes a best-effort guess (e.g., unquoted keys, stray punctuation). Always validate the result.
- Severe truncation — You can detect it, but you can't recover missing content without another generation.
-
Adversarial or multi-JSON outputs —
parse_jsonextracts the first JSON object/array boundary it finds. Complex tool traces or multiple embedded objects may need custom handling.
Security note: If you're parsing untrusted model output, treat "repaired JSON" as untrusted input. Validate types and ranges.
Get Started
bash
pip install handoff-guard
The package is handoff-guard, the import namespace is handoff:
python
from handoff import guard, retry, parse_json
That's it. No config files. No API keys. No Docker.
GitHub: github.com/acartag7/handoff-guard
PyPI: pypi.org/project/handoff-guard
What's Next
The library does what it set out to do. I'm not planning major features just bug fixes and edge cases as users report them, as it actually works for my current need.
If you hit something it doesn't handle, open an issue.
Built because "ValidationError: 1 validation error" tells you nothing useful.
Top comments (0)