How to Get Structured Output from Any LLM in 5 Min

#beginners #tutorial #ai #python

You asked an LLM to extract contact info from an email. It returned a wall of text instead of clean data. Now you're writing regex to parse a response that changes format every time.

There's a better way. PydanticAI's output_type parameter forces any LLM to return typed, validated data -- no parsing required.

The Code

import asyncio
from pydantic import BaseModel, Field
from pydantic_ai import Agent


class ContactInfo(BaseModel):
    """Structured contact details extracted from text."""
    name: str = Field(description="Full name of the person")
    email: str = Field(description="Email address")
    company: str = Field(description="Company or organization")
    role: str = Field(description="Job title or role")


agent = Agent(
    'openai:gpt-4o',
    output_type=ContactInfo,
    instructions='Extract contact information from the provided text.',
)

raw_text = """
Hey, just met Sarah Chen at the DevTools Summit.
She's the VP of Engineering at Acme Corp.
Her email is sarah.chen@acmecorp.io -- said she's
interested in our API. Follow up next week.
"""

result = agent.run_sync(raw_text)

print(result.output)
#> name='Sarah Chen' email='sarah.chen@acmecorp.io' company='Acme Corp' role='VP of Engineering'

print(result.output.name)    # Sarah Chen
print(result.output.email)   # sarah.chen@acmecorp.io
print(result.output.company) # Acme Corp

That's it. No regex. No JSON parsing. No retry loops for malformed output.

How It Works

Define your schema. ContactInfo is a standard Pydantic BaseModel. The Field(description=...) hints tell the LLM what each field should contain. Pydantic validates the response automatically -- if the LLM returns garbage, you get a clear validation error instead of silent corruption.

Set output_type on the Agent. This is the key line. When you pass output_type=ContactInfo, PydanticAI registers a tool with the LLM whose parameters match your model's JSON schema. The LLM is forced to call that tool, so it can't return plain text.

Access typed fields directly. result.output isn't a dict or a string -- it's a ContactInfo instance. Your IDE gives you autocomplete. Your type checker catches mistakes. Your downstream code gets clean data every time.

Handling Multiple Output Types

Sometimes the LLM can't extract the data you need. Instead of letting it hallucinate, give it an escape hatch:

class ExtractionFailed(BaseModel):
    """Use when contact info cannot be extracted."""
    reason: str

agent = Agent(
    'openai:gpt-4o',
    output_type=[ContactInfo, ExtractionFailed],
    instructions='Extract contact info. If the text has no contact details, explain why.',
)

result = agent.run_sync('The weather in Tokyo is sunny today.')
print(result.output)
#> reason='The text contains weather information but no contact details such as name, email, company, or role.'

Pass a list of types to output_type and PydanticAI registers each as a separate tool. The LLM picks the right one. You check isinstance(result.output, ContactInfo) in your code and handle each case.

Why This Matters

Structured output is the bridge between "cool LLM demo" and "production agent." Every multi-step agent workflow depends on it -- one agent extracts data, the next agent acts on it. If the first agent returns unstructured text, the whole pipeline breaks.

PydanticAI handles the hard parts: schema generation, tool registration, response validation, and automatic retries when the model returns invalid data. You just define a BaseModel and go.

If you're building agents that chain structured outputs across multiple steps, platforms like Nebula handle tool orchestration and output routing so you can focus on the agent logic.

Quick Reference

What you need	How to do it
Single structured type	`output_type=MyModel`
Multiple possible types	`output_type=[TypeA, TypeB]`
Structured + plain text fallback	`output_type=[MyModel, str]`
Custom tool names	`output_type=ToolOutput(MyModel, name='...')`