LLMs are probabilistic text generators. In a notebook demo, that's fine. In production, it means your pipeline will occasionally receive a Python dict where you expected JSON, a 900-word paragraph where you asked for three bullet points, or a hallucinated field name that breaks your downstream schema. This post is not about theory — it's about five concrete patterns, each with working code, that handle these failures reliably.
The core problem
You're calling an LLM API expecting structured output. The model has been prompted carefully. But over thousands of calls, you'll see:
- Malformed JSON (trailing commas, unquoted keys, markdown code fences wrapping the payload)
- Responses that exceed or fall short of length constraints
- Fields that exist in the schema but contain garbage ("confidence": "I am quite certain")
- Duplicate entries in batch completions
- The model evaluating its own output charitably when asked to self-check
Each pattern below addresses one failure mode.
import json
import re
import time
import hashlib
from openai import OpenAI
llm_client = OpenAI(
api_key="your_api_key",
base_url="https://api.your-llm-provider.com/v1",
)
def call_llm(messages: list[dict], model: str = "gpt-4o-mini",
temperature: float = 0.3, max_tokens: int = 1000) -> str:
response = llm_client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
)
return response.choices[0].message.content.strip()
Pattern 1: JSON schema validation with retry
Problem: The model returns valid JSON 98% of the time and something subtly broken the other 2%. Your parser crashes and you lose the request.
Bad solution: json.loads() with a bare except that returns None. You swallow errors silently and downstream code explodes later.
Good solution: Parse, validate against a schema, and retry with an error hint that tells the model exactly what went wrong.
import jsonschema
ARTICLE_SCHEMA = {
"type": "object",
"required": ["title", "summary", "tags", "difficulty"],
"properties": {
"title": {"type": "string", "minLength": 10, "maxLength": 120},
"summary": {"type": "string", "minLength": 50},
"tags": {"type": "array", "items": {"type": "string"}, "minItems": 1},
"difficulty": {"type": "string", "enum": ["beginner", "intermediate", "advanced"]},
},
"additionalProperties": False,
}
def extract_json_from_response(text: str) -> str:
"""Strip markdown code fences if present."""
match = re.search(r"```
(?:json)?\s*([\s\S]*?)
```", text)
if match:
return match.group(1).strip()
# Try to find raw JSON object
match = re.search(r"\{[\s\S]*\}", text)
if match:
return match.group(0)
return text
def call_with_json_schema(prompt: str, schema: dict,
max_retries: int = 3) -> dict:
messages = [
{"role": "system", "content": (
"You are a data extraction assistant. "
"Always respond with valid JSON matching the requested schema. "
"No prose, no markdown fences, just the JSON object."
)},
{"role": "user", "content": prompt},
]
last_error = None
for attempt in range(max_retries):
raw = call_llm(messages)
json_str = extract_json_from_response(raw)
try:
data = json.loads(json_str)
jsonschema.validate(instance=data, schema=schema)
return data
except json.JSONDecodeError as e:
last_error = f"JSON parse error: {e}. Raw output was: {json_str[:200]}"
except jsonschema.ValidationError as e:
last_error = f"Schema validation failed: {e.message}"
# Append error feedback and retry
messages.append({"role": "assistant", "content": raw})
messages.append({"role": "user", "content": (
f"That response had an error: {last_error}\n"
"Please fix it and return only the corrected JSON."
)})
time.sleep(0.5 * (attempt + 1)) # back off slightly
raise ValueError(f"Failed after {max_retries} attempts. Last error: {last_error}")
Pattern 2: Length constraint enforcement
Problem: You ask for a 2-sentence summary and get a paragraph. Or you ask for 500 words and get 80. Downstream rendering breaks.
Bad solution: Truncate with response[:500]. You cut mid-sentence and produce garbage.
Good solution: Measure, then retry with a correction hint that quantifies the delta.
def count_words(text: str) -> int:
return len(text.split())
def call_with_length_constraint(prompt: str, min_words: int, max_words: int,
max_retries: int = 3) -> str:
messages = [
{"role": "system", "content": (
f"Write responses between {min_words} and {max_words} words. "
"Count carefully before submitting."
)},
{"role": "user", "content": prompt},
]
for attempt in range(max_retries):
response = call_llm(messages, max_tokens=max_words * 2)
word_count = count_words(response)
if min_words <= word_count <= max_words:
return response
delta = word_count - max_words if word_count > max_words else min_words - word_count
direction = "shorter" if word_count > max_words else "longer"
hint = (
f"Your response was {word_count} words. "
f"It needs to be {abs(delta)} words {direction}. "
f"Target: {min_words}–{max_words} words. Rewrite it."
)
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": hint})
# Last resort: hard truncate/expand with note
final = call_llm(messages, max_tokens=max_words * 2)
words = final.split()
if len(words) > max_words:
return " ".join(words[:max_words])
return final
Pattern 3: Regex-based field extraction as fallback
Problem: The model consistently wraps values in prose ("The severity is: HIGH") instead of returning a clean value. JSON parsing fails; you can't proceed.
Good solution: Regex extraction as a structured fallback — not a replacement for JSON, but a recovery layer when JSON fails.
FIELD_PATTERNS = {
"severity": r"\b(LOW|MEDIUM|HIGH|CRITICAL)\b",
"score": r"\b(\d+(?:\.\d+)?)\s*(?:/\s*10)?",
"category": r"\b(spam|phishing|legitimate|malware|unknown)\b",
"confidence": r"confidence[:\s]+(\d+(?:\.\d+)?)%?",
}
def extract_fields_with_regex(text: str,
fields: list[str]) -> dict:
"""
Attempt to extract structured fields from prose output using regex.
Returns None for fields that cannot be extracted.
"""
result = {}
text_upper = text.upper()
for field in fields:
pattern = FIELD_PATTERNS.get(field)
if not pattern:
result[field] = None
continue
match = re.search(pattern, text_upper if field == "severity" else text,
re.IGNORECASE)
result[field] = match.group(1) if match else None
return result
def classify_with_fallback(text_to_classify: str) -> dict:
prompt = (
f'Classify this text:\n\n"{text_to_classify}"\n\n'
'Return JSON: {"category": "spam|phishing|legitimate", '
'"severity": "LOW|MEDIUM|HIGH|CRITICAL", "confidence": 0-100}'
)
messages = [{"role": "user", "content": prompt}]
raw = call_llm(messages, temperature=0.1)
try:
json_str = extract_json_from_response(raw)
return json.loads(json_str)
except (json.JSONDecodeError, ValueError):
# Fallback: extract fields with regex
extracted = extract_fields_with_regex(raw, ["category", "severity", "confidence"])
extracted["_extraction_method"] = "regex_fallback"
return extracted
Pattern 4: Confidence scoring via self-evaluation
Problem: The model answers confidently even when it's guessing. You need a signal to route low-confidence answers to human review.
Key insight: Ask the model to evaluate its own answer in a separate call. Self-evaluation in the same call is biased upward.
def get_answer_with_confidence(question: str, context: str) -> dict:
# Step 1: Generate answer
answer_messages = [
{"role": "system", "content": "Answer based strictly on the provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"},
]
answer = call_llm(answer_messages, temperature=0.2)
# Step 2: Evaluate in a separate call
eval_messages = [
{"role": "system", "content": (
"You are an impartial evaluator. Assess answer quality strictly. "
"Return JSON: {\"confidence\": 0-100, \"issues\": [list of concerns], "
"\"grounded\": true/false}"
)},
{"role": "user", "content": (
f"Question: {question}\n\n"
f"Context provided:\n{context}\n\n"
f"Answer given:\n{answer}\n\n"
"Evaluate: Is this answer fully supported by the context? "
"Are there unsupported claims? Score 0-100."
)},
]
eval_raw = call_llm(eval_messages, temperature=0.0)
try:
eval_data = json.loads(extract_json_from_response(eval_raw))
except (json.JSONDecodeError, ValueError):
eval_data = {"confidence": 50, "issues": ["evaluation_parse_failed"], "grounded": None}
return {
"answer": answer,
"confidence": eval_data.get("confidence", 50),
"issues": eval_data.get("issues", []),
"grounded": eval_data.get("grounded"),
"needs_review": eval_data.get("confidence", 50) < 70,
}
Pattern 5: Deduplication across batch outputs
Problem: You process 50 documents in batch and ask the model to extract key entities from each. You get overlapping, near-duplicate entries that pollute your downstream data.
Good solution: Hash-based exact dedup combined with a lightweight similarity check for near-duplicates.
from difflib import SequenceMatcher
def deduplicate_outputs(items: list[str],
similarity_threshold: float = 0.85) -> list[str]:
"""
Remove exact duplicates (hash) and near-duplicates (sequence similarity).
Keeps the first occurrence of each unique item.
"""
seen_hashes: set[str] = set()
unique_items: list[str] = []
for item in items:
normalized = item.strip().lower()
item_hash = hashlib.md5(normalized.encode()).hexdigest()
if item_hash in seen_hashes:
continue # exact duplicate
# Check near-duplicate against existing unique items
is_near_dup = any(
SequenceMatcher(None, normalized, existing.strip().lower()).ratio()
>= similarity_threshold
for existing in unique_items
)
if not is_near_dup:
unique_items.append(item)
seen_hashes.add(item_hash)
return unique_items
def batch_extract_entities(documents: list[str], entity_type: str) -> list[str]:
all_entities = []
for doc in documents:
messages = [
{"role": "system", "content": (
f"Extract all {entity_type} from the text. "
"Return a JSON array of strings. Nothing else."
)},
{"role": "user", "content": doc},
]
raw = call_llm(messages, temperature=0.1)
try:
entities = json.loads(extract_json_from_response(raw))
if isinstance(entities, list):
all_entities.extend(entities)
except (json.JSONDecodeError, ValueError):
pass # log and continue — one bad doc shouldn't stop the batch
return deduplicate_outputs(all_entities)
Putting it all together
These patterns compose. A production pipeline for classifying user-submitted content might chain them:
def robust_classify(text: str) -> dict:
try:
result = call_with_json_schema(
prompt=f'Classify this text: "{text}"',
schema={
"type": "object",
"required": ["category", "severity", "confidence"],
"properties": {
"category": {"type": "string", "enum": ["spam", "phishing", "legitimate", "toxic"]},
"severity": {"type": "string", "enum": ["LOW", "MEDIUM", "HIGH", "CRITICAL"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 100},
},
},
max_retries=3,
)
except ValueError:
# Pattern 3 fallback
result = classify_with_fallback(text)
# Pattern 4: flag for human review if uncertain
result["needs_review"] = result.get("confidence", 100) < 65
return result
These five patterns cover the vast majority of production failures. Start with Pattern 1 (JSON schema + retry) and Pattern 3 (regex fallback) — they handle 80% of output issues. Add Pattern 4 (self-evaluation) when you have a human review queue and need to route intelligently. For content pipelines like the moderation system described in practical security guides, Patterns 1 and 5 together eliminate most of the noise from batch LLM processing.
Top comments (0)