Most LLM tutorials end the same way: you get a string back, you write a regex, and you pray.
We spent three months building production AI agents. The single change that eliminated the most bugs was not prompt engineering, not model upgrades, not retry logic. It was making every LLM call return a Pydantic model instead of raw text.
This article covers 4 working approaches to structured LLM outputs in Python — from direct SDK calls to framework-level abstractions. Every code example is verified against official documentation as of February 2026.
Why Strings Break Production Systems
Here is what happens when you parse LLM output manually:
# The fragile approach
response = call_llm("Extract the user's name and email from: ...")
# response = "The user's name is John and email is john@example.com"
import re
name = re.search(r"name is (\w+)", response)
email = re.search(r"email is ([\w@.]+)", response)
# What if the model says "Name: John" instead? Broken.
# What if it adds a comma after the email? Broken.
# What if it returns JSON sometimes and text other times? Broken.
The failure modes multiply: missing fields, wrong types, inconsistent formats across calls, and silent data corruption when the model rephrases its output.
Structured outputs solve this at the protocol level. The model is constrained to produce valid JSON matching your schema. No parsing. No regex. No prayer.
Approach 1: OpenAI — client.chat.completions.parse()
OpenAI's Python SDK (v1.x+) has a .parse() method that accepts a Pydantic model directly and returns a typed object.
from pydantic import BaseModel
from openai import OpenAI
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
client = OpenAI()
completion = client.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Extract the event information."},
{
"role": "user",
"content": "Alice and Bob are going to a science fair on Friday.",
},
],
response_format=CalendarEvent,
)
event = completion.choices[0].message.parsed
print(event.name) # "Science Fair"
print(event.participants) # ["Alice", "Bob"]
print(type(event)) # <class 'CalendarEvent'>
What happens under the hood: The SDK converts your Pydantic model to a JSON schema, sends it as response_format, and deserializes the response back into your model class. The model uses constrained decoding — it physically cannot produce tokens that violate your schema.
Compatibility note: Structured outputs work with gpt-4o-2024-08-06 and later models. The first request with a new schema has additional latency (typically under 10 seconds) while the schema is compiled. Subsequent requests use a cached grammar.
What to watch for: OpenAI's structured outputs support a subset of JSON Schema. Constraints like minimum, maximum, minLength, and maxLength are stripped before sending. The SDK adds these constraints to field descriptions instead, so the model sees them as instructions rather than hard constraints. Pydantic still validates them on the response.
Approach 2: Anthropic — client.messages.parse()
As of late 2025, Claude supports structured outputs natively. The Anthropic Python SDK provides a .parse() method similar to OpenAI's.
from pydantic import BaseModel
from anthropic import Anthropic
class ContactInfo(BaseModel):
name: str
email: str
plan_interest: str
demo_requested: bool
client = Anthropic()
response = client.messages.parse(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{
"role": "user",
"content": (
"Extract the key information from this email: "
"John Smith (john@example.com) is interested in our "
"Enterprise plan and wants to schedule a demo."
),
}
],
output_format=ContactInfo,
)
contact = response.parsed_output
print(contact.name) # "John Smith"
print(contact.demo_requested) # True
print(type(contact)) # <class 'ContactInfo'>
Key difference from OpenAI: The parameter is called output_format (not response_format), and the parsed result lives at response.parsed_output (not response.choices[0].message.parsed). The underlying mechanism uses output_config.format with json_schema type.
Supported models (as of February 2026): Claude Opus 4.6, Claude Sonnet 4.6, Claude Sonnet 4.5, Claude Opus 4.5, and Claude Haiku 4.5. The feature is generally available on the Anthropic API and Amazon Bedrock.
Schema limitations are similar to OpenAI: No recursive schemas, no complex enum types, and additionalProperties must be false on all objects. The SDK handles schema transformation automatically — it strips unsupported constraints and moves them into field descriptions.
Approach 3: Instructor — Provider-Agnostic Structured Outputs
Instructor is a library built on top of Pydantic that works across 15+ LLM providers with a unified API.
pip install instructor
import instructor
from pydantic import BaseModel, Field
class Person(BaseModel):
name: str = Field(description="Full legal name")
age: int = Field(ge=0, le=150)
occupation: str
# Works with any provider
client = instructor.from_provider("openai/gpt-4o-mini")
person = client.create(
response_model=Person,
messages=[
{
"role": "user",
"content": "Extract: John is a 30-year-old software engineer",
}
],
)
print(person) # Person(name='John', age=30, occupation='software engineer')
Why use Instructor over raw SDK calls:
Automatic retries on validation failure. If the model returns
age: -5and your Pydantic model hasge=0, Instructor catches the validation error, feeds it back to the model, and retries. The raw SDKs don't do this.Provider switching with zero code changes. Swap
"openai/gpt-4o-mini"for"anthropic/claude-sonnet-4-5"and nothing else changes.Nested and complex models work out of the box. The library handles schema translation edge cases across providers.
When Instructor is overkill: If you use a single provider and your schemas are simple, the native SDK .parse() methods are simpler and have fewer dependencies.
Approach 4: LangChain — with_structured_output()
LangChain provides structured output support through its chat model interface.
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
class MovieReview(BaseModel):
"""A structured movie review."""
title: str = Field(description="Movie title")
rating: float = Field(description="Rating out of 10")
pros: list[str] = Field(description="Positive aspects")
cons: list[str] = Field(description="Negative aspects")
recommendation: bool = Field(description="Would recommend")
llm = ChatOpenAI(model="gpt-4o-mini")
structured_llm = llm.with_structured_output(MovieReview)
review = structured_llm.invoke(
"Review the movie 'Interstellar' focusing on scientific accuracy"
)
print(review.title) # "Interstellar"
print(review.rating) # 8.5
print(review.recommendation) # True
print(type(review)) # <class 'MovieReview'>
How it works: with_structured_output() returns a new runnable that wraps the model. Under the hood, it uses the provider's native structured output support when available (OpenAI, Anthropic, Gemini) or falls back to tool-calling-based extraction.
LangChain also supports TypedDict and raw JSON schemas in addition to Pydantic. With Pydantic, you get back a Pydantic instance. With TypedDict or JSON schema, you get a plain dict.
When to use LangChain's approach: When you're already in a LangChain pipeline and want structured output to compose with other runnables (chains, agents, tools). The .with_structured_output() method returns a standard LangChain Runnable, so it works with .pipe(), .batch(), and streaming.
Pydantic Patterns That Work Well With LLMs
The Pydantic model you define is not just a schema — it's a prompt. Field names, descriptions, and types all guide the model's output.
Pattern 1: Use Field(description=...) Everywhere
class Analysis(BaseModel):
sentiment: str = Field(
description="One of: positive, negative, neutral"
)
confidence: float = Field(
description="Confidence score between 0.0 and 1.0"
)
key_phrases: list[str] = Field(
description="Top 3 most important phrases from the text"
)
Without descriptions, the model guesses what confidence means. With them, it follows your specification. This is the single highest-ROI change you can make to any structured output schema.
Pattern 2: Use Literal for Constrained Choices
from typing import Literal
class TicketClassification(BaseModel):
priority: Literal["low", "medium", "high", "critical"]
category: Literal["bug", "feature", "docs", "security"]
assigned_team: Literal["backend", "frontend", "infra", "ml"]
Literal types become enum constraints in the JSON schema. The model can only pick from your defined options. No "Medium" vs "medium" vs "MEDIUM" inconsistencies.
Pattern 3: Nested Models for Complex Structures
class Address(BaseModel):
street: str
city: str
country: str
postal_code: str
class Company(BaseModel):
name: str
industry: str
headquarters: Address
founded_year: int
employee_count: int = Field(description="Approximate number of employees")
Nesting works across all 4 approaches. The schema is flattened into JSON Schema $ref definitions automatically.
Pattern 4: Optional Fields With Defaults
from typing import Optional
class ExtractedEntity(BaseModel):
name: str
entity_type: str
confidence: float
source_url: Optional[str] = None
aliases: list[str] = Field(default_factory=list)
Optional fields let the model skip data it can't find, rather than hallucinating a value. This matters more than most teams realize — forced required fields on uncertain data produce plausible-looking garbage.
The Decision Matrix
| Approach | Best For | Retries | Multi-Provider | Dependencies |
|---|---|---|---|---|
| OpenAI SDK | OpenAI-only projects | No | No | openai |
| Anthropic SDK | Claude-only projects | No | No | anthropic |
| Instructor | Multi-provider, validation-heavy | Yes | Yes | instructor |
| LangChain | Chain/agent pipelines | Via chains | Yes | langchain-* |
Start with the native SDK for your provider. Move to Instructor when you need retries or multi-provider support. Use LangChain when structured output is one step in a larger pipeline.
Common Pitfalls
1. Schema too complex. Both OpenAI and Anthropic limit JSON Schema features. No recursive types. No minimum/maximum as hard constraints (they become hints in descriptions). Keep models flat when possible.
2. Missing additionalProperties: false. Both providers require this on every object in the schema. Pydantic sets this automatically when using model_config = ConfigDict(extra='forbid'), but the SDKs also handle it during schema transformation.
3. First-request latency. The first call with a new schema compiles a grammar on the provider side. OpenAI: under 10 seconds. Anthropic: similar. Subsequent requests with the same schema are fast.
4. Refusals bypass the schema. If the model refuses a request for safety reasons, the response will not match your schema. Always check stop_reason (Anthropic) or finish_reason (OpenAI) before accessing parsed output.
5. Token limits truncate output. If max_tokens is too low, the JSON output gets cut off mid-field. Set max_tokens high enough for your expected response size, and check for max_tokens stop reason.
What We Use in Production
We run 80+ AI agents in production. Every agent that calls an LLM uses Pydantic models for input and output schemas. The pattern is always the same:
- Define a Pydantic model for the expected output.
- Call the LLM with structured output enabled.
- Get back a typed Python object.
- Pass it to the next function with full type safety.
No regex. No json.loads() with try/except. No "parse the response and hope." The schema is the contract between the LLM and your code.
Structured outputs turned our agent debugging from "why did it return that string?" to "why did it populate this field with that value?" The second question is always easier to answer.
Follow @klement_gunndu for more Python and AI engineering content. We're building in public.
Top comments (0)