DEV Community

Nebula
Nebula

Posted on

Top 5 Structured Output Libraries for LLMs in 2026

TL;DR: For most projects, start with Instructor (Python) or Zod + zodResponseFormat (TypeScript) -- they're the fastest path to reliable structured output from any cloud LLM. If you run local models and need guaranteed schema compliance with zero retries, go with Outlines. For Python agent workflows with tool calling, PydanticAI is the best bet.


Why Structured Output Libraries Matter

Every production LLM application hits the same wall: you ask the model for JSON, and it wraps the response in a markdown code block, adds a friendly preamble, or returns "score": "high" when you needed a float.

Structured output libraries solve this by sitting between your code and the LLM API, ensuring every response matches your schema -- either by validating after generation and retrying on failure, or by constraining the model's token generation so invalid output is physically impossible.

Agent platforms like Nebula.gg depend on structured outputs to ensure every tool call returns validated data -- because when your agent runs autonomously, a malformed JSON response doesn't just break a UI, it breaks the entire workflow.

Here are the 5 libraries that matter in 2026, compared side-by-side.


Comparison Table

Feature Instructor PydanticAI Outlines Guidance Zod (OpenAI)
Approach Post-gen validation Post-gen validation Pre-gen constraint Pre-gen constraint Post-gen (provider API)
Language Python, TS, Go, Ruby Python Python Python TypeScript
Schema Pydantic Pydantic Pydantic / JSON Schema Custom DSL + Pydantic Zod
LLM providers Any (OpenAI, Anthropic, Ollama) Any (OpenAI, Anthropic, Gemini) Local models, vLLM, TGI Local (Transformers, llama.cpp) OpenAI, Gemini (via adapter)
100% schema guarantee No (retries) No (retries) Yes Yes Depends on provider
Auto-retry Yes (built-in) Yes (built-in) N/A (always valid) N/A (always valid) No
Agent/tool support No Yes No Yes (branching) No
GitHub stars 12.3k 14.5k 13.3k 19k N/A (part of Zod)
Best for Quick extraction, any provider Python agents with tools Guaranteed compliance Complex branching logic TypeScript projects

1. Instructor -- Simplest Integration

Instructor wraps any LLM client with Pydantic validation and automatic retry. It's the most language-flexible option, with official SDKs for Python, TypeScript, Go, and Ruby.

The API is dead simple: define a Pydantic model, patch your client, and pass response_model:

import instructor
from openai import OpenAI
from pydantic import BaseModel
from typing import Literal

class Sentiment(BaseModel):
    label: Literal["positive", "negative", "neutral"]
    confidence: float

client = instructor.from_openai(OpenAI())

result = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Analyze: This product is amazing!"}],
    response_model=Sentiment,
    max_retries=3
)
print(result)  # label='positive' confidence=0.95
Enter fullscreen mode Exit fullscreen mode

Strength: Works with any LLM provider. Minimal API surface. Multi-language.
Weakness: No 100% guarantee -- retries cost extra API calls on edge cases.
Best for: Teams that want structured output with the least code changes.
Pricing: Free, open-source (MIT).


2. PydanticAI -- Type-Safe Python Agents

PydanticAI is built by the Pydantic team and brings FastAPI-style developer experience to AI agents. It goes beyond extraction -- agents can call tools and receive injected dependencies.

from pydantic_ai import Agent
from pydantic import BaseModel
from typing import Literal

class TicketClassification(BaseModel):
    category: Literal["bug", "feature", "question"]
    priority: Literal["low", "medium", "high"]

agent = Agent("openai:gpt-4o", output_type=TicketClassification)
result = agent.run_sync("Classify: Login page crashes on Safari")
print(result.output)  # category='bug' priority='high'
Enter fullscreen mode Exit fullscreen mode

Strength: Tool support with dependency injection. Type-safe end-to-end. Backed by the Pydantic team.
Weakness: Python-only. Heavier than Instructor if you just need extraction.
Best for: Python developers building agents that need structured output AND tool calling.
Pricing: Free, open-source (MIT).


3. Outlines -- Guaranteed Valid Output

Outlines takes a fundamentally different approach. Instead of validating output after the fact, it uses a finite state machine (FSM) to mask invalid tokens during generation. The model can only produce schema-compliant output.

import outlines
from transformers import AutoModelForCausalLM, AutoTokenizer
from pydantic import BaseModel
from typing import Literal

class Sentiment(BaseModel):
    label: Literal["positive", "negative", "neutral"]
    confidence: float

model = outlines.from_transformers(
    AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B"),
    AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B")
)

result = model("Analyze sentiment: This is terrible!", Sentiment)
print(result)  # label='negative' confidence=0.92
Enter fullscreen mode Exit fullscreen mode

Strength: 100% schema compliance. Zero retries, zero wasted API calls.
Weakness: Requires local models or compatible inference servers (vLLM, TGI). More infrastructure setup.
Best for: High-throughput pipelines where every failed retry costs money.
Pricing: Free, open-source (Apache 2.0).


4. Guidance -- Branching During Generation

Guidance lets you run Python control flow while the model generates tokens. You can branch into different schemas based on intermediate output -- something no other library can do.

from guidance import models, select, guidance
from guidance import json as gen_json
from pydantic import BaseModel
from typing import Literal

class BugReport(BaseModel):
    title: str
    severity: Literal["minor", "major", "critical"]

class FeatureRequest(BaseModel):
    title: str
    priority: Literal["low", "medium", "high"]

lm = models.Transformers("Qwen/Qwen2.5-1.5B")

@guidance
def classify_and_extract(lm, text):
    lm += f"Classify this ticket: {text}\n"
    lm += f"Type: {select(['bug', 'feature'], name='type')}\n"
    if lm["type"] == "bug":
        lm += gen_json(name="result", schema=BugReport)
    else:
        lm += gen_json(name="result", schema=FeatureRequest)
    return lm

result = lm + classify_and_extract("Login crashes on Safari")
print(result["type"])    # 'bug'
print(result["result"])  # {"title": "Login crash on Safari", "severity": "major"}
Enter fullscreen mode Exit fullscreen mode

Strength: Conditional schemas and branching logic during generation. Most powerful control flow.
Weakness: Highest learning curve. Local models only. Custom DSL to learn.
Best for: Complex multi-schema workflows where the output structure depends on intermediate decisions.
Pricing: Free, open-source (MIT).


5. Zod + zodResponseFormat -- TypeScript Native

Zod combined with OpenAI's zodResponseFormat helper is the de facto standard for structured output in TypeScript. If you're in the TS ecosystem, this is the path of least resistance.

import OpenAI from 'openai';
import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';

const SentimentSchema = z.object({
  label: z.enum(['positive', 'negative', 'neutral']),
  confidence: z.number().min(0).max(1),
});

const client = new OpenAI();

const response = await client.beta.chat.completions.parse({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'Analyze: This product is amazing!' },
  ],
  response_format: zodResponseFormat(SentimentSchema, 'sentiment'),
});

const result = response.choices[0].message.parsed;
console.log(result); // { label: 'positive', confidence: 0.95 }
Enter fullscreen mode Exit fullscreen mode

Strength: Zero-config with OpenAI. Full TypeScript type inference. Massive ecosystem.
Weakness: Tied to OpenAI (or providers that support their structured output API). Not universal.
Best for: TypeScript/Node.js projects, especially those already using OpenAI.
Pricing: Free, open-source (MIT).


How to Choose

Here's the decision tree:

  • Cloud API + Python, just need extraction? Start with Instructor. Least code, most provider flexibility.
  • Python agents that call tools? Use PydanticAI. Built for agent workflows with dependency injection.
  • Local models + need guaranteed compliance? Go with Outlines. Zero retries, 100% valid output.
  • Complex branching logic during generation? Guidance is the only option that supports conditional schemas.
  • TypeScript project? Zod + zodResponseFormat is the obvious choice.

A practical rule: start with post-generation validation (Instructor or Zod). It works with cloud APIs, requires zero infrastructure, and handles 95%+ of use cases. Escalate to pre-generation constraints (Outlines) when retry costs or compliance requirements justify running local models.


Verdict

Instructor is the safest default for most projects starting today -- it works everywhere, requires almost no learning curve, and covers the 80% case.

PydanticAI is the forward-looking choice for Python agent developers who need more than just extraction.

Outlines wins for high-throughput production pipelines where every failed retry costs real money.

All five libraries are free and open-source. Pick the one that matches your language, provider, and complexity needs -- then ship.


Previously in this series: We've compared AI agent frameworks, hosting platforms, and code sandboxes. Next up: AI guardrails tools for production LLMs.

Top comments (0)