Forrester Terry

Posted on Feb 28, 2025

Supercharging Your AI Agents with Pydantic: A Developer's Guide 🚀

#tutorial #ai #beginners #python

Introduction 👋
What is Pydantic and Why Should AI Developers Care? 🤔
- The AI Agent's Output Problem 🤖💬
How Pydantic Makes AI Agents More Reliable 🛡️
Real-World Use Cases 🌍
- OpenAI Function Calling + Pydantic
- Multi-Agent Systems (CrewAI)
Best Practices for Using Pydantic with AI Agents ✅
A Simple Tutorial: Getting Started with Pydantic and AI 🏗️
Advanced Tips and Tricks 🧠
Conclusion 🎯

Introduction 👋

Have you ever built an AI agent that sometimes returns unpredictable data structures? Or perhaps you've dealt with the frustration of parsing JSON from a language model only to have your application crash because a field was missing or had the wrong type?

I've been there too! That's why today I want to talk about one of my favorite tools for taming the wild outputs of AI agents: Pydantic!

In this guide, I'll show you how Pydantic can transform your AI agent development from a game of chance into a reliable, robust process. Let's dive in! 🏊‍♂️

What is Pydantic and Why Should AI Developers Care? 🤔

Pydantic is a Python library for data validation that uses type annotations. Think of it as a bouncer for your data - it checks IDs (types) at the door and makes sure only the right data gets into your application.

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int
    is_active: bool

But why is this particularly useful for AI agents? Here's the thing:

AI models (especially LLMs) are amazing at generating content, but they're not always precise about formatting. They might return JSON where a number is accidentally a string ("42" instead of 42), or they might forget a field entirely. Without validation, these small inconsistencies can cause big problems downstream in your application.

The AI Agent's Output Problem 🤖💬

Imagine asking an AI agent to return information about a product:

{
  "name": "Super Widget",
  "price": "29.99",
  "in_stock": "true",
  "features": ["durable", "lightweight"]
}

Notice the issues? price is a string instead of a float, and in_stock is a string instead of a boolean. Your application might crash when it tries to do math with that price or make a decision based on the stock status.

Pydantic to the rescue! It can automatically convert these types for you, so "29.99" becomes 29.99 (float) and "true" becomes True (boolean).

How Pydantic Makes AI Agents More Reliable 🛡️

1. Automatic Type Conversion

Pydantic doesn't just validate - it tries to convert data to the right type when possible. This is perfect for AI outputs that are almost correct.

from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    features: list[str]

# Even though price and in_stock are strings, Pydantic will convert them
product = Product(
    name="Super Widget", 
    price="29.99", 
    in_stock="true",
    features=["durable", "lightweight"]
)

print(product)
# Product(name='Super Widget', price=29.99, in_stock=True, features=['durable', 'lightweight'])

2. Clear Error Messages

When validation fails, Pydantic tells you exactly what went wrong:

try:
    Product(name="Broken Widget", price="expensive", in_stock=True, features="strong")
except Exception as e:
    print(f"Error: {e}")

# Error: 2 validation errors for Product
# price
#   Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='expensive', input_type=str]
# features
#   Input should be a valid list [type=list_type, input_value='strong', input_type=str]

These detailed errors make debugging so much easier when your AI agent returns unexpected data.

3. Schema Generation for Guiding AI Outputs

One of my favorite Pydantic features for AI development is its ability to generate JSON schemas:

print(Product.model_json_schema())
# {
#   "title": "Product",
#   "type": "object",
#   "properties": {
#     "name": {"title": "Name", "type": "string"},
#     "price": {"title": "Price", "type": "number"},
#     "in_stock": {"title": "In Stock", "type": "boolean"},
#     "features": {
#       "title": "Features",
#       "type": "array",
#       "items": {"type": "string"}
#     }
#   },
#   "required": ["name", "price", "in_stock", "features"]
# }

You can use this schema to guide your AI model, especially if you're using function calling with OpenAI or similar features with other providers. This dramatically improves the chances of getting correctly formatted responses!

Real-World Use Cases 🌍

Let's look at how developers are using Pydantic with AI agents in the wild:

OpenAI Function Calling + Pydantic

from openai import OpenAI
from pydantic import BaseModel, Field

class WeatherInfo(BaseModel):
    location: str = Field(..., description="The city and state")
    temperature: float = Field(..., description="Current temperature in Celsius")
    condition: str = Field(..., description="Weather condition (sunny, cloudy, etc.)")

# Get the JSON schema for OpenAI
function_def = {
    "name": "get_weather",
    "description": "Get the current weather in a location",
    "parameters": WeatherInfo.model_json_schema()
}

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "user", "content": "What's the weather like in Boston?"}
    ],
    functions=[function_def],
    function_call={"name": "get_weather"}
)

# Extract and validate the response
function_response = response.choices[0].message.function_call.arguments
weather = WeatherInfo.model_validate_json(function_response)
print(f"It's {weather.temperature}°C and {weather.condition} in {weather.location}")

This approach has two benefits:

The schema guides the AI to produce properly structured output
Pydantic validates that output as an extra safety measure

Multi-Agent Systems (CrewAI)

In systems where multiple AI agents collaborate, consistent data structures are critical. Frameworks like CrewAI use Pydantic to ensure agents communicate properly:

from pydantic import BaseModel
from crewai import Agent, Task, Crew

class ResearchReport(BaseModel):
    topic: str
    key_findings: list[str]
    sources: list[str]

# Define a task with a Pydantic output schema
research_task = Task(
    description="Research the latest advancements in quantum computing",
    expected_output="A structured research report with key findings and sources",
    output_pydantic=ResearchReport
)

# When the agent completes the task, CrewAI validates the output against the model

Best Practices for Using Pydantic with AI Agents ✅

After working with numerous AI projects, here are my top recommendations:

1. Start with a Clear Data Model

Define Pydantic models that capture exactly what you need from your AI agent. Be specific about types and constraints:

from typing import Optional, Literal
from pydantic import BaseModel, Field, conint

class ProductRecommendation(BaseModel):
    product_name: str
    price_range: str = Field(..., pattern=r"^\$\d+-\$\d+$")  # Ensure format like "$10-$20"
    rating: conint(ge=1, le=5)  # Integer between 1-5
    category: Literal["electronics", "clothing", "home", "books", "other"]
    features: list[str] = Field(..., min_items=2, max_items=5)
    in_stock: bool
    shipping_days: Optional[int] = None

This detailed model serves as documentation and validation in one package.

2. Handle Validation Errors Gracefully

Always wrap your Pydantic validation in try/except blocks:

try:
    recommendation = ProductRecommendation.model_validate_json(ai_response)
    # Use the structured data
except Exception as e:
    # Log the error
    print(f"AI output validation failed: {e}")

    # Possible strategies:
    # 1. Use a fallback approach
    # 2. Re-prompt the AI with the error details
    # 3. Apply some fixes and retry validation

3. Consider Re-Prompting When Validation Fails

One powerful approach is to tell the AI exactly what went wrong and ask it to fix the response:

def get_validated_response(prompt):
    for attempt in range(3):  # Try up to 3 times
        response = ai_model.generate(prompt)

        try:
            result = ProductRecommendation.model_validate_json(response)
            return result
        except Exception as e:
            if attempt < 2:  # Don't update prompt on the last attempt
                prompt += f"\nYour previous response had validation errors: {e}. Please fix them and try again."

    # If we get here, all attempts failed
    raise ValueError("Could not get valid response after multiple attempts")

This feedback loop helps the AI learn from its mistakes!

A Simple Tutorial: Getting Started with Pydantic and AI 🏗️

Let's bring everything together with a simple tutorial. We'll create a movie recommendation agent that returns properly structured data:

Step 1: Define Your Pydantic Model

from pydantic import BaseModel, Field
from typing import List, Optional

class MovieRecommendation(BaseModel):
    title: str
    year: int = Field(..., ge=1900, le=2030)
    genres: List[str] = Field(..., min_items=1)
    rating: float = Field(..., ge=0.0, le=10.0)
    director: str
    streaming_on: Optional[List[str]] = None
    description: str = Field(..., max_length=500)

Step 2: Create a Function to Get Recommendations from an AI

def get_movie_recommendation(genre_preference, mood, decade_preference=None):
    # Construct a prompt for the AI
    prompt = f"""
    Suggest a movie based on the following:
    Genre preference: {genre_preference}
    Mood: {mood}
    Decade preference: {decade_preference or 'any'}

    Return the recommendation as a JSON object with the following fields:
    - title: the movie title
    - year: the release year (1900-2030)
    - genres: list of genres
    - rating: rating out of 10
    - director: the director's name
    - streaming_on: list of streaming platforms (if known) or null
    - description: brief description (max 500 chars)
    """

    # In a real application, you'd call your AI model here
    # For demonstration, let's pretend we got this response:
    ai_response = """
    {
      "title": "The Grand Budapest Hotel",
      "year": 2014,
      "genres": ["Comedy", "Drama", "Adventure"],
      "rating": 8.1,
      "director": "Wes Anderson",
      "streaming_on": ["HBO Max", "Disney+"],
      "description": "A writer encounters the owner of an aging high-class hotel, who tells him of his early years serving as a lobby boy in the hotel's glorious years under an exceptional concierge."
    }
    """

    try:
        # Parse and validate the AI response
        recommendation = MovieRecommendation.model_validate_json(ai_response)
        return recommendation
    except Exception as e:
        print(f"Error validating AI response: {e}")
        # In a real application, you might implement retry logic here
        return None

Step 3: Use the Recommendation in Your Application

def display_recommendation(recommendation):
    if not recommendation:
        return "Sorry, couldn't generate a valid recommendation."

    return f"""
    🎬 {recommendation.title} ({recommendation.year}) - {recommendation.rating}/10

    Directed by: {recommendation.director}
    Genres: {', '.join(recommendation.genres)}

    {recommendation.description}

    {f"Available on: {', '.join(recommendation.streaming_on)}" if recommendation.streaming_on else "Streaming info not available"}
    """

# Get and display a recommendation
user_genre = "sci-fi"
user_mood = "thoughtful"
user_decade = "2010s"

movie = get_movie_recommendation(user_genre, user_mood, user_decade)
print(display_recommendation(movie))

Advanced Tips and Tricks 🧠

Want to take your Pydantic + AI game to the next level? Here are some advanced techniques:

Custom Validators for Domain-Specific Rules

from pydantic import BaseModel, Field, validator

class TravelRecommendation(BaseModel):
    destination: str
    best_months: list[str]
    budget_usd: int = Field(..., gt=0)

    @validator('best_months')
    def check_valid_months(cls, months):
        valid_months = ["January", "February", "March", "April", "May", "June", 
                      "July", "August", "September", "October", "November", "December"]

        for month in months:
            if month not in valid_months:
                raise ValueError(f"Invalid month: {month}")
        return months

Nested Models for Complex Data

class Address(BaseModel):
    street: str
    city: str
    state: str
    country: str
    postal_code: str

class Contact(BaseModel):
    name: str
    email: str
    phone: Optional[str] = None
    address: Address

class BusinessListing(BaseModel):
    name: str
    category: str
    rating: float
    contact: Contact
    hours: dict[str, str]

Working with LangChain's Pydantic Output Parser

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from pydantic import BaseModel, Field

class AmazonProduct(BaseModel):
    name: str = Field(description="The product name")
    price: float = Field(description="The product price in USD")
    rating: float = Field(description="Rating from 1-5")
    reviews: int = Field(description="Number of reviews")

parser = PydanticOutputParser(pydantic_object=AmazonProduct)

prompt = PromptTemplate(
    template="Extract product information from this text:\n{text}\n{format_instructions}",
    input_variables=["text"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

model = OpenAI()

input_text = """
This amazing laptop is the MacBook Pro 16-inch, priced at $2,399. It has received 
excellent feedback from customers, with a 4.8 star rating based on 3,842 reviews.
"""

output = model(prompt.format(text=input_text))
product = parser.parse(output)

Conclusion 🎯

Pydantic is more than just a validation library - it's your AI agent's best friend! By defining clear data models and validating inputs and outputs, you can:

Make your AI applications more reliable
Catch errors early before they cascade into bigger problems
Guide your models to produce better-structured outputs
Create self-documenting code that clearly specifies what data you expect

The next time you're building an AI agent, take the time to define your data models with Pydantic. Your future self (and your users) will thank you!

Have you used Pydantic with AI projects? Feel free to share your experiences in the comments!

DEV Community