Ayi NEDJIMI

Posted on May 30

Pydantic AI vs LangChain vs instructor: structured LLM outputs compared

#ai #python #tutorial #llm

Getting structured data out of a language model reliably is harder than it looks. The model might return JSON that's almost valid, skip required fields, or wrap the object in a markdown block. Three Python libraries try to solve this differently: instructor, LangChain's structured output, and PydanticAI. This article is a direct comparison based on actual use — not documentation.

What "structured output" actually means

When you call a language model you get back a string. If you want a Python object — a typed dict, a Pydantic model, a dataclass — you need something to bridge the gap. There are two broad approaches:

JSON mode / function calling: You send the schema to the model and it commits to returning valid JSON matching that schema.
Parse-and-retry: You ask for JSON, validate with Pydantic, and if validation fails you send the error back and ask the model to fix it.

All three libraries use some mix of these approaches, but they differ in how much control you retain and how much complexity they hide.

instructor: minimal surface area, maximum control

instructor patches an OpenAI (or compatible) client to add a response_model parameter. That's basically the entire API surface.

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field

client = instructor.from_openai(OpenAI())

class ExtractedIssue(BaseModel):
    title: str = Field(description="Short issue title, max 10 words")
    severity: str = Field(description="One of: critical, high, medium, low")
    cve_id: str | None = Field(default=None, description="CVE identifier if present")

issue = client.chat.completions.create(
    model="gpt-4o",
    response_model=ExtractedIssue,
    messages=[
        {"role": "user", "content": "CVE-2024-1234 is a critical RCE in OpenSSL affecting all versions before 3.3.0"}
    ]
)

print(issue.severity)   # "critical"
print(issue.cve_id)     # "CVE-2024-1234"

What instructor does under the hood: it converts your Pydantic model to a function/tool schema, sends it to the model, and if the model returns invalid JSON it automatically retries with the validation error appended to the conversation (up to a configurable max_retries).

Strengths: Tiny API, works with any OpenAI-compatible endpoint (Ollama, Mistral, Groq), full access to the underlying client for cost and latency inspection.

Weaknesses: Tied to OpenAI-compatible APIs by default. Adding a native Anthropic or Gemini client requires specific patches (instructor.from_anthropic(), etc.), which occasionally lag behind the main SDKs.

LangChain structured output: convenient but layered

LangChain's .with_structured_output() works across many provider integrations:

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import Literal

class SecurityFinding(BaseModel):
    vulnerability_type: Literal["injection", "auth", "crypto", "config", "other"]
    affected_component: str
    remediation: str = Field(description="Concrete fix, 1-2 sentences")

llm = ChatOpenAI(model="gpt-4o")
structured_llm = llm.with_structured_output(SecurityFinding)

finding = structured_llm.invoke(
    "The application stores passwords in plain text in the users table."
)
print(finding.vulnerability_type)   # "crypto"
print(finding.remediation)          # "Hash passwords with bcrypt or Argon2..."

LangChain dispatches to function calling or JSON mode based on the model and falls back gracefully. The integration layer is its biggest advantage: the same .with_structured_output() call works with ChatAnthropic, ChatGoogleGenerativeAI, ChatOllama, and others without changing your extraction logic.

Strengths: Provider-agnostic, plugs cleanly into chains and graphs (LCEL, LangGraph), good for pipelines that already use LangChain.

Weaknesses: The abstraction cost is real. Debugging a failed extraction means stepping through several layers to understand what was actually sent to the model. The library itself is large — importing it purely for structured output is like using a web framework to serve a single file. Retry behavior on validation failure is less explicit than instructor's.

PydanticAI: agents-first, structured output built in

PydanticAI is the newest of the three. It frames everything as agents with typed result_type parameters:

from pydantic_ai import Agent
from pydantic import BaseModel, Field

class ThreatSummary(BaseModel):
    threat_actor: str | None
    attack_vector: str
    confidence: float = Field(ge=0.0, le=1.0)
    recommended_action: str

agent = Agent(
    "openai:gpt-4o",
    result_type=ThreatSummary,
    system_prompt=(
        "You are a threat intelligence analyst. Extract structured threat data. "
        "Set confidence based on how specific and verifiable the information is."
    ),
)

result = agent.run_sync(
    "Lazarus Group was observed exploiting CVE-2025-0987 via spear-phishing in March 2026, "
    "targeting South Korean financial institutions."
)

print(result.data.threat_actor)    # "Lazarus Group"
print(result.data.confidence)      # 0.85

PydanticAI also has first-class support for multi-step agents, tool use, dependency injection, and streaming. If your use case goes beyond extraction — if you need the model to call tools and produce a structured final output — PydanticAI is the most ergonomic of the three.

Strengths: Clean async-native design, proper dependency injection for testing, streaming structured output, multi-agent patterns built in.

Weaknesses: Younger project (expect API churn), fewer community examples, heavier than instructor for pure extraction tasks.

Performance and reliability in practice

For production extraction pipelines, the practical differences come down to a few axes:

	instructor	LangChain	PydanticAI
Retry on validation failure	Explicit, configurable	Provider-dependent	Automatic
Provider support	OpenAI-compat + patches	Widest	OpenAI, Anthropic, Gemini, Ollama
Bundle size	Small	Large	Medium
Streaming structured output	Limited	Limited	Yes
Multi-agent / tool use	No	Via LangGraph	Built-in

For a pure extraction task — document to typed Python object — instructor is the default to reach for. It's the smallest possible surface area over the model API, validation errors are transparent, and it works against any OpenAI-compatible endpoint including local models.

If you're building a full pipeline with multiple steps, conditional logic, or external tool calls and you're already in the LangChain ecosystem, .with_structured_output() is the natural choice. You get cross-provider compatibility without maintaining separate extraction code per provider.

PydanticAI shines when you need agents that reason, call tools, and return a structured result at the end. Its dependency injection model is particularly valuable for testing — you can swap the real LLM client for a deterministic mock at the test boundary without changing agent code.

The takeaway

There's no universal winner. Choose based on what surrounds the extraction step:

Isolated extraction from documents or raw text? Use instructor.
Pipeline with routing, memory, or tool calls already using LangChain? Use .with_structured_output().
Agent that reasons and produces typed final output? Use PydanticAI.

One thing all three share: the quality of your Pydantic model definition directly affects extraction accuracy. Well-written Field(description=...) annotations end up in the function schema the model sees — a clear description of severity being "one of: critical, high, medium, low" consistently outperforms leaving it implicit.

For security-adjacent pipelines in particular, typed outputs are non-negotiable. When extraction feeds into alerting or risk scoring, silent type coercion or missing fields create hard-to-trace downstream failures. Our security hardening checklists include guidance on where LLM-assisted analysis fits in a defensible pipeline architecture — worth a look if you're building this into production workflows.

I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. We publish free security hardening checklists — PDF and Excel.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.