If you ask an LLM for structured output and validate it against a schema, you already know the failure
mode: most of the time it is fine, and every so often it hands back something that does not parse or
misses a required field. The usual reflex is to wrap the call in a retry and move on.
The problem is that a plain retry is the same prompt, the same temperature, roughly the same odds. You
are paying for another round and hoping the dice land differently.
There is a better move, and it is almost free to add: when validation fails, put the validation error
and the model's own bad output back into the next prompt, and ask it to fix that specific thing.
Here is the core of the loop I used for this on a RAG platform I built (trimmed to the essential path):
while attempts < max_attempts:
try:
msgs = [messages] if isinstance(messages, str) else messages
if error_message:
msgs = [*msgs, error_message] # last attempt's error rides along
response = await make_completion_request(..., messages=msgs)
if validator:
validator(response) # raises ValidationError on bad output
return response
except ValidationError as e:
attempts += 1
error_message = f"""
The last response from the API failed validation due to the following error:
<error>{format_error_for_llm(e)}</error>
Your task is to fix the error and return the corrected response data:
<data>{serialize(response).decode()}</data>
"""
response = None
Two details do the work:
-
The error is described for the model, not for a log.
format_error_for_llmturns the raw validation exception into a plain instruction ("field X must be an integer, you sent a string"). The model is good at patching a concrete, named mistake; it is bad at guessing why an opaque retry keeps failing. - You hand back its own previous output as the thing to correct. It is not regenerating from scratch, it is editing. That keeps the parts that were already right and usually fixes the one field that was wrong on the first pass.
The tradeoffs, because there always are some:
- It costs an extra call on a bad response, and the follow-up prompt is longer (it carries the error plus the prior payload). On a schema that fails often, that adds up. Cap the attempts.
- It only works when the bad response is parseable enough to serialize back into the prompt. Truly empty or truncated output has nothing to correct, so you still need a normal retry underneath.
- Prompt semantics do not always transfer if you also fail over between providers mid-loop. If you do that, do not count a provider swap as a real attempt.
That is the whole idea. It is not a framework, it is about ten lines around a call you already have. If
you are generating structured output at any volume, it turns a chunk of your "model was flaky" retries
into first-try-after-feedback successes.
Top comments (0)