DEV Community

Nebula
Nebula

Posted on

How to Get Structured Output from Any LLM in 5 Min

You asked an LLM to extract contact info from an email. It returned a wall of text instead of clean data. Now you're writing regex to parse a response that changes format every time.

There's a better way. PydanticAI's output_type parameter forces any LLM to return typed, validated data -- no parsing required.

The Code

import asyncio
from pydantic import BaseModel, Field
from pydantic_ai import Agent


class ContactInfo(BaseModel):
    """Structured contact details extracted from text."""
    name: str = Field(description="Full name of the person")
    email: str = Field(description="Email address")
    company: str = Field(description="Company or organization")
    role: str = Field(description="Job title or role")


agent = Agent(
    'openai:gpt-4o',
    output_type=ContactInfo,
    instructions='Extract contact information from the provided text.',
)

raw_text = """
Hey, just met Sarah Chen at the DevTools Summit.
She's the VP of Engineering at Acme Corp.
Her email is sarah.chen@acmecorp.io -- said she's
interested in our API. Follow up next week.
"""

result = agent.run_sync(raw_text)

print(result.output)
#> name='Sarah Chen' email='sarah.chen@acmecorp.io' company='Acme Corp' role='VP of Engineering'

print(result.output.name)    # Sarah Chen
print(result.output.email)   # sarah.chen@acmecorp.io
print(result.output.company) # Acme Corp
Enter fullscreen mode Exit fullscreen mode

That's it. No regex. No JSON parsing. No retry loops for malformed output.

How It Works

Define your schema. ContactInfo is a standard Pydantic BaseModel. The Field(description=...) hints tell the LLM what each field should contain. Pydantic validates the response automatically -- if the LLM returns garbage, you get a clear validation error instead of silent corruption.

Set output_type on the Agent. This is the key line. When you pass output_type=ContactInfo, PydanticAI registers a tool with the LLM whose parameters match your model's JSON schema. The LLM is forced to call that tool, so it can't return plain text.

Access typed fields directly. result.output isn't a dict or a string -- it's a ContactInfo instance. Your IDE gives you autocomplete. Your type checker catches mistakes. Your downstream code gets clean data every time.

Handling Multiple Output Types

Sometimes the LLM can't extract the data you need. Instead of letting it hallucinate, give it an escape hatch:

class ExtractionFailed(BaseModel):
    """Use when contact info cannot be extracted."""
    reason: str

agent = Agent(
    'openai:gpt-4o',
    output_type=[ContactInfo, ExtractionFailed],
    instructions='Extract contact info. If the text has no contact details, explain why.',
)

result = agent.run_sync('The weather in Tokyo is sunny today.')
print(result.output)
#> reason='The text contains weather information but no contact details such as name, email, company, or role.'
Enter fullscreen mode Exit fullscreen mode

Pass a list of types to output_type and PydanticAI registers each as a separate tool. The LLM picks the right one. You check isinstance(result.output, ContactInfo) in your code and handle each case.

Why This Matters

Structured output is the bridge between "cool LLM demo" and "production agent." Every multi-step agent workflow depends on it -- one agent extracts data, the next agent acts on it. If the first agent returns unstructured text, the whole pipeline breaks.

PydanticAI handles the hard parts: schema generation, tool registration, response validation, and automatic retries when the model returns invalid data. You just define a BaseModel and go.

If you're building agents that chain structured outputs across multiple steps, platforms like Nebula handle tool orchestration and output routing so you can focus on the agent logic.

Quick Reference

What you need How to do it
Single structured type output_type=MyModel
Multiple possible types output_type=[TypeA, TypeB]
Structured + plain text fallback output_type=[MyModel, str]
Custom tool names output_type=ToolOutput(MyModel, name='...')

Install and try it now:

pip install pydantic-ai
Enter fullscreen mode Exit fullscreen mode

For more AI agent patterns, check out the other articles in the AI Agent Quick Tips series -- including LLM fallbacks, guardrails, and agent memory.

Top comments (0)