Fix Bad Structured Output by Feeding the Error Back to the Model

#hermeschallenge #ai #python #agents

You asked the model to return JSON. It returned almost-JSON with a trailing comma. Your parser crashed. You want to retry, but just resending the same prompt will probably produce the same bad output.

The better approach: tell the model what went wrong. Append the parse error as a follow-up message. Ask for a corrected response.

llm-structured-retry implements this pattern.

The Shape of the Fix

from llm_structured_retry import StructuredRetry, StructuredRetryExhausted

def call_json(prompt: str) -> dict:
    retry = StructuredRetry(
        max_attempts=3,
        parser=json.loads,  # your parser
        error_formatter=lambda e: f"Parse failed: {e}. Return valid JSON only.",
    )

    return retry.call(
        fn=lambda: call_llm(prompt),
        extract_text=lambda r: r.content[0].text,
    )

On parse failure, StructuredRetry appends the error message as a new user turn and calls the model again. The model sees what it returned, sees what broke, and gets a chance to fix it.

What It Does NOT Do

llm-structured-retry does not fix the output automatically. It asks the model to fix it. Whether the model succeeds depends on the model's understanding of the error.

It does not implement a general retry for all errors. It is specifically for cases where parsing the model's output fails and you want to use the parse error to guide a correction.

For rate limit retries and provider availability retries, use llm-retry-py. That is a different problem.

Inside the Library

The retry loop builds a conversation that includes the previous failed attempt:

def call(self, fn: Callable[[], T], extract_text: Callable[[T], str]) -> Any:
    messages = []

    for attempt in range(self._max_attempts):
        response = fn()
        text = extract_text(response)

        try:
            return self._parser(text)
        except Exception as e:
            if attempt == self._max_attempts - 1:
                raise StructuredRetryExhausted(
                    f"Failed after {self._max_attempts} attempts",
                    last_output=text,
                    last_error=str(e),
                )

            error_msg = self._error_formatter(e)
            messages = [
                {"role": "assistant", "content": text},  # what model returned
                {"role": "user", "content": error_msg},  # what broke
            ]

            # Next call includes the error conversation
            original_fn = fn
            fn = lambda: call_llm_with_history(messages, original_fn)

The key: the model sees its own previous output alongside the error. This is the minimal context it needs to understand what went wrong and produce a corrected version.

StructuredRetryExhausted carries last_output and last_error so you can log both when all attempts fail. This tells you whether the model was consistently producing the same malformed output (prompt issue) or whether it was improving but not quite getting there (model capability issue).

When to Use It

Use it when you need structured output (JSON, YAML, specific format) from a model and you cannot use provider-native structured output modes.

Use it when provider-native JSON mode is not available for your use case (certain tool configurations, certain models) or when you need YAML or another format that the provider does not support natively.

The error feedback pattern works best when the error message is specific. "Parse failed: Expecting property name enclosed in double quotes: line 3 column 1 (char 45)" tells the model where the problem is. "Parse failed" tells it nothing useful.

Install

pip install git+https://github.com/MukundaKatta/llm-structured-retry

from llm_structured_retry import StructuredRetry, StructuredRetryExhausted
import json, yaml

# JSON extraction
json_retry = StructuredRetry(
    max_attempts=3,
    parser=json.loads,
    error_formatter=lambda e: (
        f"Your response was not valid JSON. Error: {e}\n"
        "Return ONLY the JSON object, no explanation, no markdown code blocks."
    ),
)

# YAML extraction
yaml_retry = StructuredRetry(
    max_attempts=3,
    parser=yaml.safe_load,
    error_formatter=lambda e: (
        f"Your response was not valid YAML. Error: {e}\n"
        "Return ONLY the YAML content, properly indented."
    ),
)

# Custom parser
def parse_numbered_list(text: str) -> list[str]:
    lines = text.strip().split("\n")
    items = []
    for line in lines:
        if line and line[0].isdigit():
            items.append(line.split(".", 1)[1].strip())
    if not items:
        raise ValueError("No numbered items found in response")
    return items

list_retry = StructuredRetry(
    max_attempts=2,
    parser=parse_numbered_list,
    error_formatter=lambda e: (
        f"Your response did not contain a numbered list. Error: {e}\n"
        "Return items as a numbered list: 1. Item one\n2. Item two"
    ),
)

Sibling Libraries

Library	What it solves
`llm-retry-py`	Retry on rate limits, timeouts, provider errors
`llm-output-validator`	Rule-based validation of output shape
`tool-arg-coerce-py`	Coerce parsed output to expected types
`agentvet`	Validate tool arguments before execution
`llm-fallback-chain`	Fall through to backup provider on persistent failure

The structured output pipeline: llm-structured-retry for parse-error-guided correction, llm-output-validator for shape validation after parsing, tool-arg-coerce-py for type coercion on parsed values.

What's Next

Schema-aware error messages: if you pass a JSON schema alongside the parser, the error formatter could generate a schema-specific error message ("Field 'priority' is required but missing" instead of "KeyError: 'priority'"). This would make correction more targeted.

Partial repair before retry: for JSON specifically, try json5 or demjson to parse leniently before giving up and retrying with error feedback. Some models produce consistently fixable JSON (missing quotes, trailing commas) that a lenient parser can handle without retry overhead.

Streaming retry: for streaming responses, detect parse failure on the complete response and retry. Streaming complicates the history accumulation because you need to collect the full text before parsing.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.