Peyton Green

Posted on May 19

Structured LLM Outputs with Pydantic v2: Stop Parsing Freeform JSON and Start Typing Your AI

#python #ai #tutorial #pydantic

The biggest source of subtle bugs in AI applications isn't the model — it's the gap between what you asked for and what you got.

You prompt for {"score": 8, "issues": ["missing error handling"]} and you get {"score": "8/10", "issues": "missing error handling"}. Both are technically valid JSON. One breaks your downstream code. Neither triggers an exception until hours later when you're wondering why the aggregation is wrong.

Pydantic v2 eliminates this class of bugs. Here's how to structure your LLM outputs so type errors are caught at the boundary, not buried in production.

The problem with freeform JSON parsing

Most developers start here:

import json
from anthropic import Anthropic

client = Anthropic()

def analyze_code(code: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Analyze this code and return JSON with: severity (int 1-10), issues (list of strings), has_security_risk (bool).\n\n{code}"
        }]
    )
    return json.loads(response.content[0].text)

This fails in three ways you won't notice until production:

Type coercion silently wrong. The model returns "severity": "8" instead of 8. json.loads parses it as a string. Your downstream severity > 7 comparison evaluates to False for every input.
Missing fields. The model occasionally omits has_security_risk when it seems obvious from context. KeyError three calls in, two hours into a batch job.
Schema drift. You update the prompt. The model starts returning an extra field. Your downstream code ignores it. A week later you realize the data you've been storing is inconsistent.

The Pydantic v2 fix

Define your output schema first:

from pydantic import BaseModel, Field, field_validator
from typing import Annotated

class CodeAnalysis(BaseModel):
    severity: Annotated[int, Field(ge=1, le=10)]
    issues: list[str]
    has_security_risk: bool
    summary: str = ""  # optional with default

    @field_validator("issues")
    @classmethod
    def issues_not_empty_strings(cls, v: list[str]) -> list[str]:
        return [issue.strip() for issue in v if issue.strip()]

Now parse with validation:

def analyze_code(code: str) -> CodeAnalysis:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Analyze this code. Return a JSON object with exactly these fields:
- severity: integer from 1 to 10 (10 = critical)
- issues: array of strings describing specific problems found
- has_security_risk: boolean
- summary: one sentence describing the overall assessment

Code:
{code}"""
        }]
    )

    raw = extract_json(response.content[0].text)
    return CodeAnalysis.model_validate(raw)

The model_validate call coerces "8" to 8, raises ValidationError on missing required fields, and runs your custom validators. The error surfaces at the boundary, not downstream.

Extracting JSON from model responses

Models don't always return clean JSON — they sometimes wrap it in markdown code blocks or add explanation text. A reliable extractor:

import re

def extract_json(text: str) -> dict:
    """Extract JSON from model response, handling markdown code blocks."""
    # Try markdown code block first
    match = re.search(r"```

(?:json)?\s*(\{.*?\})\s*

```", text, re.DOTALL)
    if match:
        return json.loads(match.group(1))

    # Try raw JSON object
    match = re.search(r"\{.*\}", text, re.DOTALL)
    if match:
        return json.loads(match.group(0))

    raise ValueError(f"No JSON found in response: {text[:200]}")

This handles the three most common response formats:

{"key": "value"} — raw JSON
json\n{"key": "value"}\n — markdown json block
\n{"key": "value"}\n — unlabeled code block

Prompt patterns that produce consistent schema adherence

The prompt matters as much as the parser. Patterns that reduce schema drift:

Explicit field types in the prompt:

Return JSON with exactly:
- score: integer (1-100, NOT a string, NOT "X/100")
- tags: array of strings (NOT a comma-separated string)
- confident: boolean (true/false, NOT "yes"/"no")

Spelling out "NOT a string" sounds redundant. It cuts type coercion errors by ~80% in practice.

Repeat the schema in the system prompt:

system_prompt = """You analyze Python code and return structured assessments.

ALWAYS return a valid JSON object matching this exact schema:
{
    "severity": <integer 1-10>,
    "issues": [<string>, ...],
    "has_security_risk": <boolean>,
    "summary": <string>
}

Never include markdown formatting. Never add extra fields. Never omit required fields."""

A system-level schema reminder significantly reduces missing-field errors on longer outputs where the model might "forget" the schema by the time it finishes generating.

Temperature for structured outputs:

For strict schema adherence, use lower temperature (0.2-0.4). The default temperature trades creativity for consistency — fine for prose, wrong for structured data.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    temperature=0.3,  # deterministic enough for reliable JSON
    ...
)

Handling validation errors gracefully

Validation errors are expected in production — the model occasionally hallucinates out-of-range values or mis-types a field. Don't let them crash your application:

from pydantic import ValidationError
import logging

logger = logging.getLogger(__name__)

def analyze_code_safe(code: str) -> CodeAnalysis | None:
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            temperature=0.3,
            messages=[...],
        )
        raw = extract_json(response.content[0].text)
        return CodeAnalysis.model_validate(raw)

    except ValidationError as e:
        logger.warning(
            "Schema validation failed",
            extra={"errors": e.errors(), "code_snippet": code[:100]}
        )
        return None

    except (ValueError, json.JSONDecodeError) as e:
        logger.error("JSON extraction failed", extra={"error": str(e)})
        return None

Log the validation errors — e.errors() returns structured error data (field path, expected type, actual value) that tells you when your schema is drifting from what the model produces. Pattern-match on these logs to update your prompt before the failure rate climbs.

Nested schemas

For complex outputs, compose Pydantic models:

from pydantic import BaseModel
from typing import Literal

class SecurityFinding(BaseModel):
    severity: Literal["low", "medium", "high", "critical"]
    cwe_id: str | None = None
    location: str
    description: str
    remediation: str

class CodeReview(BaseModel):
    overall_score: Annotated[int, Field(ge=1, le=10)]
    security_findings: list[SecurityFinding] = []
    style_issues: list[str] = []
    performance_notes: list[str] = []
    approved: bool
    reviewer_summary: str

Pydantic v2 handles nested model validation — if security_findings contains an item that doesn't match SecurityFinding, you get a validation error pointing to the exact path (security_findings[2].severity).

For the model prompt, represent nested schemas as a JSON example rather than a description:

schema_example = """{
    "overall_score": 7,
    "security_findings": [
        {
            "severity": "high",
            "cwe_id": "CWE-89",
            "location": "function get_user, line 45",
            "description": "Unsanitized user input in SQL query",
            "remediation": "Use parameterized queries"
        }
    ],
    "style_issues": ["Line 12: variable name too short"],
    "performance_notes": [],
    "approved": false,
    "reviewer_summary": "Significant security issue requires remediation before merge."
}"""

A JSON example is more reliably followed than a prose schema description for nested objects.

Streaming with structured outputs

For long outputs where you want to stream but still validate:

import json

def analyze_code_streaming(code: str) -> CodeAnalysis:
    chunks = []

    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        temperature=0.3,
        messages=[...],
    ) as stream:
        for text in stream.text_stream:
            chunks.append(text)
            # optionally yield chunks to caller here

    full_response = "".join(chunks)
    raw = extract_json(full_response)
    return CodeAnalysis.model_validate(raw)

Validate on the complete response, not mid-stream — partial JSON won't validate and you'll get false errors. Stream for latency perception; validate at the end for correctness.

A complete working pattern

Here's the full pattern assembled, ready to adapt:

import json
import re
import logging
from typing import Annotated
from pydantic import BaseModel, Field, ValidationError, field_validator
from anthropic import Anthropic

logger = logging.getLogger(__name__)
client = Anthropic()

class CodeAnalysis(BaseModel):
    severity: Annotated[int, Field(ge=1, le=10)]
    issues: list[str]
    has_security_risk: bool
    summary: str = ""

    @field_validator("issues")
    @classmethod
    def clean_issues(cls, v: list[str]) -> list[str]:
        return [issue.strip() for issue in v if issue.strip()]

def extract_json(text: str) -> dict:
    match = re.search(r"```

(?:json)?\s*(\{.*?\})\s*

```", text, re.DOTALL)
    if match:
        return json.loads(match.group(1))
    match = re.search(r"\{.*\}", text, re.DOTALL)
    if match:
        return json.loads(match.group(0))
    raise ValueError(f"No JSON found in response")

def analyze_code(code: str) -> CodeAnalysis | None:
    try:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            temperature=0.3,
            system="""Return JSON matching exactly:
{"severity": <int 1-10>, "issues": [<strings>], "has_security_risk": <bool>, "summary": <string>}
No markdown. No extra fields.""",
            messages=[{"role": "user", "content": f"Analyze:\n\n{code}"}],
        )
        raw = extract_json(response.content[0].text)
        return CodeAnalysis.model_validate(raw)

    except ValidationError as e:
        logger.warning("Validation failed", extra={"errors": e.errors()})
        return None
    except Exception as e:
        logger.error("Analysis failed", extra={"error": str(e)})
        return None

What this gives you that freeform parsing doesn't

Type safety end-to-end. analysis.severity is always int. Your type checker knows it. Your IDE autocompletes it.
Validation at the boundary. Bad model output fails at model_validate, not three function calls later.
Structured error logging. ValidationError.errors() tells you which field, which constraint, which value. Useful for monitoring model drift over time.
Schema as documentation. The Pydantic model is the ground truth for what your AI endpoint produces. CodeAnalysis.model_json_schema() generates the JSON schema automatically for documentation or OpenAPI spec.

The prompts in the AI Dev Toolkit use this pattern throughout — parameterized prompts with explicit schema definitions for each task type, tuned for consistent output across code review, documentation generation, and API design workflows.

DEV Community