The biggest source of subtle bugs in AI applications isn't the model — it's the gap between what you asked for and what you got.
You prompt for {"score": 8, "issues": ["missing error handling"]} and you get {"score": "8/10", "issues": "missing error handling"}. Both are technically valid JSON. One breaks your downstream code. Neither triggers an exception until hours later when you're wondering why the aggregation is wrong.
Pydantic v2 eliminates this class of bugs. Here's how to structure your LLM outputs so type errors are caught at the boundary, not buried in production.
The problem with freeform JSON parsing
Most developers start here:
import json
from anthropic import Anthropic
client = Anthropic()
def analyze_code(code: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"Analyze this code and return JSON with: severity (int 1-10), issues (list of strings), has_security_risk (bool).\n\n{code}"
}]
)
return json.loads(response.content[0].text)
This fails in three ways you won't notice until production:
Type coercion silently wrong. The model returns
"severity": "8"instead of8.json.loadsparses it as a string. Your downstreamseverity > 7comparison evaluates toFalsefor every input.Missing fields. The model occasionally omits
has_security_riskwhen it seems obvious from context.KeyErrorthree calls in, two hours into a batch job.Schema drift. You update the prompt. The model starts returning an extra field. Your downstream code ignores it. A week later you realize the data you've been storing is inconsistent.
The Pydantic v2 fix
Define your output schema first:
from pydantic import BaseModel, Field, field_validator
from typing import Annotated
class CodeAnalysis(BaseModel):
severity: Annotated[int, Field(ge=1, le=10)]
issues: list[str]
has_security_risk: bool
summary: str = "" # optional with default
@field_validator("issues")
@classmethod
def issues_not_empty_strings(cls, v: list[str]) -> list[str]:
return [issue.strip() for issue in v if issue.strip()]
Now parse with validation:
def analyze_code(code: str) -> CodeAnalysis:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""Analyze this code. Return a JSON object with exactly these fields:
- severity: integer from 1 to 10 (10 = critical)
- issues: array of strings describing specific problems found
- has_security_risk: boolean
- summary: one sentence describing the overall assessment
Code:
{code}"""
}]
)
raw = extract_json(response.content[0].text)
return CodeAnalysis.model_validate(raw)
The model_validate call coerces "8" to 8, raises ValidationError on missing required fields, and runs your custom validators. The error surfaces at the boundary, not downstream.
Extracting JSON from model responses
Models don't always return clean JSON — they sometimes wrap it in markdown code blocks or add explanation text. A reliable extractor:
import re
def extract_json(text: str) -> dict:
"""Extract JSON from model response, handling markdown code blocks."""
# Try markdown code block first
match = re.search(r"```
(?:json)?\s*(\{.*?\})\s*
```", text, re.DOTALL)
if match:
return json.loads(match.group(1))
# Try raw JSON object
match = re.search(r"\{.*\}", text, re.DOTALL)
if match:
return json.loads(match.group(0))
raise ValueError(f"No JSON found in response: {text[:200]}")
This handles the three most common response formats:
-
{"key": "value"}— raw JSON -
json\n{"key": "value"}\n— markdown json block -
\n{"key": "value"}\n— unlabeled code block
Prompt patterns that produce consistent schema adherence
The prompt matters as much as the parser. Patterns that reduce schema drift:
Explicit field types in the prompt:
Return JSON with exactly:
- score: integer (1-100, NOT a string, NOT "X/100")
- tags: array of strings (NOT a comma-separated string)
- confident: boolean (true/false, NOT "yes"/"no")
Spelling out "NOT a string" sounds redundant. It cuts type coercion errors by ~80% in practice.
Repeat the schema in the system prompt:
system_prompt = """You analyze Python code and return structured assessments.
ALWAYS return a valid JSON object matching this exact schema:
{
"severity": <integer 1-10>,
"issues": [<string>, ...],
"has_security_risk": <boolean>,
"summary": <string>
}
Never include markdown formatting. Never add extra fields. Never omit required fields."""
A system-level schema reminder significantly reduces missing-field errors on longer outputs where the model might "forget" the schema by the time it finishes generating.
Temperature for structured outputs:
For strict schema adherence, use lower temperature (0.2-0.4). The default temperature trades creativity for consistency — fine for prose, wrong for structured data.
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
temperature=0.3, # deterministic enough for reliable JSON
...
)
Handling validation errors gracefully
Validation errors are expected in production — the model occasionally hallucinates out-of-range values or mis-types a field. Don't let them crash your application:
from pydantic import ValidationError
import logging
logger = logging.getLogger(__name__)
def analyze_code_safe(code: str) -> CodeAnalysis | None:
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
temperature=0.3,
messages=[...],
)
raw = extract_json(response.content[0].text)
return CodeAnalysis.model_validate(raw)
except ValidationError as e:
logger.warning(
"Schema validation failed",
extra={"errors": e.errors(), "code_snippet": code[:100]}
)
return None
except (ValueError, json.JSONDecodeError) as e:
logger.error("JSON extraction failed", extra={"error": str(e)})
return None
Log the validation errors — e.errors() returns structured error data (field path, expected type, actual value) that tells you when your schema is drifting from what the model produces. Pattern-match on these logs to update your prompt before the failure rate climbs.
Nested schemas
For complex outputs, compose Pydantic models:
from pydantic import BaseModel
from typing import Literal
class SecurityFinding(BaseModel):
severity: Literal["low", "medium", "high", "critical"]
cwe_id: str | None = None
location: str
description: str
remediation: str
class CodeReview(BaseModel):
overall_score: Annotated[int, Field(ge=1, le=10)]
security_findings: list[SecurityFinding] = []
style_issues: list[str] = []
performance_notes: list[str] = []
approved: bool
reviewer_summary: str
Pydantic v2 handles nested model validation — if security_findings contains an item that doesn't match SecurityFinding, you get a validation error pointing to the exact path (security_findings[2].severity).
For the model prompt, represent nested schemas as a JSON example rather than a description:
schema_example = """{
"overall_score": 7,
"security_findings": [
{
"severity": "high",
"cwe_id": "CWE-89",
"location": "function get_user, line 45",
"description": "Unsanitized user input in SQL query",
"remediation": "Use parameterized queries"
}
],
"style_issues": ["Line 12: variable name too short"],
"performance_notes": [],
"approved": false,
"reviewer_summary": "Significant security issue requires remediation before merge."
}"""
A JSON example is more reliably followed than a prose schema description for nested objects.
Streaming with structured outputs
For long outputs where you want to stream but still validate:
import json
def analyze_code_streaming(code: str) -> CodeAnalysis:
chunks = []
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=2048,
temperature=0.3,
messages=[...],
) as stream:
for text in stream.text_stream:
chunks.append(text)
# optionally yield chunks to caller here
full_response = "".join(chunks)
raw = extract_json(full_response)
return CodeAnalysis.model_validate(raw)
Validate on the complete response, not mid-stream — partial JSON won't validate and you'll get false errors. Stream for latency perception; validate at the end for correctness.
A complete working pattern
Here's the full pattern assembled, ready to adapt:
import json
import re
import logging
from typing import Annotated
from pydantic import BaseModel, Field, ValidationError, field_validator
from anthropic import Anthropic
logger = logging.getLogger(__name__)
client = Anthropic()
class CodeAnalysis(BaseModel):
severity: Annotated[int, Field(ge=1, le=10)]
issues: list[str]
has_security_risk: bool
summary: str = ""
@field_validator("issues")
@classmethod
def clean_issues(cls, v: list[str]) -> list[str]:
return [issue.strip() for issue in v if issue.strip()]
def extract_json(text: str) -> dict:
match = re.search(r"```
(?:json)?\s*(\{.*?\})\s*
```", text, re.DOTALL)
if match:
return json.loads(match.group(1))
match = re.search(r"\{.*\}", text, re.DOTALL)
if match:
return json.loads(match.group(0))
raise ValueError(f"No JSON found in response")
def analyze_code(code: str) -> CodeAnalysis | None:
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
temperature=0.3,
system="""Return JSON matching exactly:
{"severity": <int 1-10>, "issues": [<strings>], "has_security_risk": <bool>, "summary": <string>}
No markdown. No extra fields.""",
messages=[{"role": "user", "content": f"Analyze:\n\n{code}"}],
)
raw = extract_json(response.content[0].text)
return CodeAnalysis.model_validate(raw)
except ValidationError as e:
logger.warning("Validation failed", extra={"errors": e.errors()})
return None
except Exception as e:
logger.error("Analysis failed", extra={"error": str(e)})
return None
What this gives you that freeform parsing doesn't
-
Type safety end-to-end.
analysis.severityis alwaysint. Your type checker knows it. Your IDE autocompletes it. -
Validation at the boundary. Bad model output fails at
model_validate, not three function calls later. -
Structured error logging.
ValidationError.errors()tells you which field, which constraint, which value. Useful for monitoring model drift over time. -
Schema as documentation. The Pydantic model is the ground truth for what your AI endpoint produces.
CodeAnalysis.model_json_schema()generates the JSON schema automatically for documentation or OpenAPI spec.
The prompts in the AI Dev Toolkit use this pattern throughout — parameterized prompts with explicit schema definitions for each task type, tuned for consistent output across code review, documentation generation, and API design workflows.
Further reading
- Pydantic v2 docs — model validators
- Anthropic API — structured outputs
- Python type hints for AI workflows
If structured AI output patterns are a repeated part of your Python workflow, the AI Dev Toolkit includes 80+ parameterized prompts for code review, documentation, API design, and debugging — each built around consistent schema output rather than freeform responses.
Top comments (0)