TL;DR: For most projects, start with Instructor (Python) or Zod + zodResponseFormat (TypeScript) -- they're the fastest path to reliable structured output from any cloud LLM. If you run local models and need guaranteed schema compliance with zero retries, go with Outlines. For Python agent workflows with tool calling, PydanticAI is the best bet.
Why Structured Output Libraries Matter
Every production LLM application hits the same wall: you ask the model for JSON, and it wraps the response in a markdown code block, adds a friendly preamble, or returns "score": "high" when you needed a float.
Structured output libraries solve this by sitting between your code and the LLM API, ensuring every response matches your schema -- either by validating after generation and retrying on failure, or by constraining the model's token generation so invalid output is physically impossible.
Agent platforms like Nebula.gg depend on structured outputs to ensure every tool call returns validated data -- because when your agent runs autonomously, a malformed JSON response doesn't just break a UI, it breaks the entire workflow.
Here are the 5 libraries that matter in 2026, compared side-by-side.
Comparison Table
| Feature | Instructor | PydanticAI | Outlines | Guidance | Zod (OpenAI) |
|---|---|---|---|---|---|
| Approach | Post-gen validation | Post-gen validation | Pre-gen constraint | Pre-gen constraint | Post-gen (provider API) |
| Language | Python, TS, Go, Ruby | Python | Python | Python | TypeScript |
| Schema | Pydantic | Pydantic | Pydantic / JSON Schema | Custom DSL + Pydantic | Zod |
| LLM providers | Any (OpenAI, Anthropic, Ollama) | Any (OpenAI, Anthropic, Gemini) | Local models, vLLM, TGI | Local (Transformers, llama.cpp) | OpenAI, Gemini (via adapter) |
| 100% schema guarantee | No (retries) | No (retries) | Yes | Yes | Depends on provider |
| Auto-retry | Yes (built-in) | Yes (built-in) | N/A (always valid) | N/A (always valid) | No |
| Agent/tool support | No | Yes | No | Yes (branching) | No |
| GitHub stars | 12.3k | 14.5k | 13.3k | 19k | N/A (part of Zod) |
| Best for | Quick extraction, any provider | Python agents with tools | Guaranteed compliance | Complex branching logic | TypeScript projects |
1. Instructor -- Simplest Integration
Instructor wraps any LLM client with Pydantic validation and automatic retry. It's the most language-flexible option, with official SDKs for Python, TypeScript, Go, and Ruby.
The API is dead simple: define a Pydantic model, patch your client, and pass response_model:
import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import Literal
class Sentiment(BaseModel):
label: Literal["positive", "negative", "neutral"]
confidence: float
client = instructor.from_openai(OpenAI())
result = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Analyze: This product is amazing!"}],
response_model=Sentiment,
max_retries=3
)
print(result) # label='positive' confidence=0.95
Strength: Works with any LLM provider. Minimal API surface. Multi-language.
Weakness: No 100% guarantee -- retries cost extra API calls on edge cases.
Best for: Teams that want structured output with the least code changes.
Pricing: Free, open-source (MIT).
2. PydanticAI -- Type-Safe Python Agents
PydanticAI is built by the Pydantic team and brings FastAPI-style developer experience to AI agents. It goes beyond extraction -- agents can call tools and receive injected dependencies.
from pydantic_ai import Agent
from pydantic import BaseModel
from typing import Literal
class TicketClassification(BaseModel):
category: Literal["bug", "feature", "question"]
priority: Literal["low", "medium", "high"]
agent = Agent("openai:gpt-4o", output_type=TicketClassification)
result = agent.run_sync("Classify: Login page crashes on Safari")
print(result.output) # category='bug' priority='high'
Strength: Tool support with dependency injection. Type-safe end-to-end. Backed by the Pydantic team.
Weakness: Python-only. Heavier than Instructor if you just need extraction.
Best for: Python developers building agents that need structured output AND tool calling.
Pricing: Free, open-source (MIT).
3. Outlines -- Guaranteed Valid Output
Outlines takes a fundamentally different approach. Instead of validating output after the fact, it uses a finite state machine (FSM) to mask invalid tokens during generation. The model can only produce schema-compliant output.
import outlines
from transformers import AutoModelForCausalLM, AutoTokenizer
from pydantic import BaseModel
from typing import Literal
class Sentiment(BaseModel):
label: Literal["positive", "negative", "neutral"]
confidence: float
model = outlines.from_transformers(
AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B"),
AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B")
)
result = model("Analyze sentiment: This is terrible!", Sentiment)
print(result) # label='negative' confidence=0.92
Strength: 100% schema compliance. Zero retries, zero wasted API calls.
Weakness: Requires local models or compatible inference servers (vLLM, TGI). More infrastructure setup.
Best for: High-throughput pipelines where every failed retry costs money.
Pricing: Free, open-source (Apache 2.0).
4. Guidance -- Branching During Generation
Guidance lets you run Python control flow while the model generates tokens. You can branch into different schemas based on intermediate output -- something no other library can do.
from guidance import models, select, guidance
from guidance import json as gen_json
from pydantic import BaseModel
from typing import Literal
class BugReport(BaseModel):
title: str
severity: Literal["minor", "major", "critical"]
class FeatureRequest(BaseModel):
title: str
priority: Literal["low", "medium", "high"]
lm = models.Transformers("Qwen/Qwen2.5-1.5B")
@guidance
def classify_and_extract(lm, text):
lm += f"Classify this ticket: {text}\n"
lm += f"Type: {select(['bug', 'feature'], name='type')}\n"
if lm["type"] == "bug":
lm += gen_json(name="result", schema=BugReport)
else:
lm += gen_json(name="result", schema=FeatureRequest)
return lm
result = lm + classify_and_extract("Login crashes on Safari")
print(result["type"]) # 'bug'
print(result["result"]) # {"title": "Login crash on Safari", "severity": "major"}
Strength: Conditional schemas and branching logic during generation. Most powerful control flow.
Weakness: Highest learning curve. Local models only. Custom DSL to learn.
Best for: Complex multi-schema workflows where the output structure depends on intermediate decisions.
Pricing: Free, open-source (MIT).
5. Zod + zodResponseFormat -- TypeScript Native
Zod combined with OpenAI's zodResponseFormat helper is the de facto standard for structured output in TypeScript. If you're in the TS ecosystem, this is the path of least resistance.
import OpenAI from 'openai';
import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';
const SentimentSchema = z.object({
label: z.enum(['positive', 'negative', 'neutral']),
confidence: z.number().min(0).max(1),
});
const client = new OpenAI();
const response = await client.beta.chat.completions.parse({
model: 'gpt-4o',
messages: [
{ role: 'user', content: 'Analyze: This product is amazing!' },
],
response_format: zodResponseFormat(SentimentSchema, 'sentiment'),
});
const result = response.choices[0].message.parsed;
console.log(result); // { label: 'positive', confidence: 0.95 }
Strength: Zero-config with OpenAI. Full TypeScript type inference. Massive ecosystem.
Weakness: Tied to OpenAI (or providers that support their structured output API). Not universal.
Best for: TypeScript/Node.js projects, especially those already using OpenAI.
Pricing: Free, open-source (MIT).
How to Choose
Here's the decision tree:
- Cloud API + Python, just need extraction? Start with Instructor. Least code, most provider flexibility.
- Python agents that call tools? Use PydanticAI. Built for agent workflows with dependency injection.
- Local models + need guaranteed compliance? Go with Outlines. Zero retries, 100% valid output.
- Complex branching logic during generation? Guidance is the only option that supports conditional schemas.
- TypeScript project? Zod + zodResponseFormat is the obvious choice.
A practical rule: start with post-generation validation (Instructor or Zod). It works with cloud APIs, requires zero infrastructure, and handles 95%+ of use cases. Escalate to pre-generation constraints (Outlines) when retry costs or compliance requirements justify running local models.
Verdict
Instructor is the safest default for most projects starting today -- it works everywhere, requires almost no learning curve, and covers the 80% case.
PydanticAI is the forward-looking choice for Python agent developers who need more than just extraction.
Outlines wins for high-throughput production pipelines where every failed retry costs real money.
All five libraries are free and open-source. Pick the one that matches your language, provider, and complexity needs -- then ship.
Previously in this series: We've compared AI agent frameworks, hosting platforms, and code sandboxes. Next up: AI guardrails tools for production LLMs.
Top comments (0)