Table of Contents
- Introduction π
- What is Pydantic and Why Should AI Developers Care? π€
- How Pydantic Makes AI Agents More Reliable π‘οΈ
- Real-World Use Cases π
- Best Practices for Using Pydantic with AI Agents β
- A Simple Tutorial: Getting Started with Pydantic and AI ποΈ
- Advanced Tips and Tricks π§
- Conclusion π―
Introduction π
Have you ever built an AI agent that sometimes returns unpredictable data structures? Or perhaps you've dealt with the frustration of parsing JSON from a language model only to have your application crash because a field was missing or had the wrong type?
I've been there too! That's why today I want to talk about one of my favorite tools for taming the wild outputs of AI agents: Pydantic!
In this guide, I'll show you how Pydantic can transform your AI agent development from a game of chance into a reliable, robust process. Let's dive in! πββοΈ
What is Pydantic and Why Should AI Developers Care? π€
Pydantic is a Python library for data validation that uses type annotations. Think of it as a bouncer for your data - it checks IDs (types) at the door and makes sure only the right data gets into your application.
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
is_active: bool
But why is this particularly useful for AI agents? Here's the thing:
AI models (especially LLMs) are amazing at generating content, but they're not always precise about formatting. They might return JSON where a number is accidentally a string ("42"
instead of 42
), or they might forget a field entirely. Without validation, these small inconsistencies can cause big problems downstream in your application.
The AI Agent's Output Problem π€π¬
Imagine asking an AI agent to return information about a product:
{
"name": "Super Widget",
"price": "29.99",
"in_stock": "true",
"features": ["durable", "lightweight"]
}
Notice the issues? price
is a string instead of a float, and in_stock
is a string instead of a boolean. Your application might crash when it tries to do math with that price or make a decision based on the stock status.
Pydantic to the rescue! It can automatically convert these types for you, so "29.99"
becomes 29.99
(float) and "true"
becomes True
(boolean).
How Pydantic Makes AI Agents More Reliable π‘οΈ
1. Automatic Type Conversion
Pydantic doesn't just validate - it tries to convert data to the right type when possible. This is perfect for AI outputs that are almost correct.
from pydantic import BaseModel
class Product(BaseModel):
name: str
price: float
in_stock: bool
features: list[str]
# Even though price and in_stock are strings, Pydantic will convert them
product = Product(
name="Super Widget",
price="29.99",
in_stock="true",
features=["durable", "lightweight"]
)
print(product)
# Product(name='Super Widget', price=29.99, in_stock=True, features=['durable', 'lightweight'])
2. Clear Error Messages
When validation fails, Pydantic tells you exactly what went wrong:
try:
Product(name="Broken Widget", price="expensive", in_stock=True, features="strong")
except Exception as e:
print(f"Error: {e}")
# Error: 2 validation errors for Product
# price
# Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='expensive', input_type=str]
# features
# Input should be a valid list [type=list_type, input_value='strong', input_type=str]
These detailed errors make debugging so much easier when your AI agent returns unexpected data.
3. Schema Generation for Guiding AI Outputs
One of my favorite Pydantic features for AI development is its ability to generate JSON schemas:
print(Product.model_json_schema())
# {
# "title": "Product",
# "type": "object",
# "properties": {
# "name": {"title": "Name", "type": "string"},
# "price": {"title": "Price", "type": "number"},
# "in_stock": {"title": "In Stock", "type": "boolean"},
# "features": {
# "title": "Features",
# "type": "array",
# "items": {"type": "string"}
# }
# },
# "required": ["name", "price", "in_stock", "features"]
# }
You can use this schema to guide your AI model, especially if you're using function calling with OpenAI or similar features with other providers. This dramatically improves the chances of getting correctly formatted responses!
Real-World Use Cases π
Let's look at how developers are using Pydantic with AI agents in the wild:
OpenAI Function Calling + Pydantic
from openai import OpenAI
from pydantic import BaseModel, Field
class WeatherInfo(BaseModel):
location: str = Field(..., description="The city and state")
temperature: float = Field(..., description="Current temperature in Celsius")
condition: str = Field(..., description="Weather condition (sunny, cloudy, etc.)")
# Get the JSON schema for OpenAI
function_def = {
"name": "get_weather",
"description": "Get the current weather in a location",
"parameters": WeatherInfo.model_json_schema()
}
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{"role": "user", "content": "What's the weather like in Boston?"}
],
functions=[function_def],
function_call={"name": "get_weather"}
)
# Extract and validate the response
function_response = response.choices[0].message.function_call.arguments
weather = WeatherInfo.model_validate_json(function_response)
print(f"It's {weather.temperature}Β°C and {weather.condition} in {weather.location}")
This approach has two benefits:
- The schema guides the AI to produce properly structured output
- Pydantic validates that output as an extra safety measure
Multi-Agent Systems (CrewAI)
In systems where multiple AI agents collaborate, consistent data structures are critical. Frameworks like CrewAI use Pydantic to ensure agents communicate properly:
from pydantic import BaseModel
from crewai import Agent, Task, Crew
class ResearchReport(BaseModel):
topic: str
key_findings: list[str]
sources: list[str]
# Define a task with a Pydantic output schema
research_task = Task(
description="Research the latest advancements in quantum computing",
expected_output="A structured research report with key findings and sources",
output_pydantic=ResearchReport
)
# When the agent completes the task, CrewAI validates the output against the model
Best Practices for Using Pydantic with AI Agents β
After working with numerous AI projects, here are my top recommendations:
1. Start with a Clear Data Model
Define Pydantic models that capture exactly what you need from your AI agent. Be specific about types and constraints:
from typing import Optional, Literal
from pydantic import BaseModel, Field, conint
class ProductRecommendation(BaseModel):
product_name: str
price_range: str = Field(..., pattern=r"^\$\d+-\$\d+$") # Ensure format like "$10-$20"
rating: conint(ge=1, le=5) # Integer between 1-5
category: Literal["electronics", "clothing", "home", "books", "other"]
features: list[str] = Field(..., min_items=2, max_items=5)
in_stock: bool
shipping_days: Optional[int] = None
This detailed model serves as documentation and validation in one package.
2. Handle Validation Errors Gracefully
Always wrap your Pydantic validation in try/except blocks:
try:
recommendation = ProductRecommendation.model_validate_json(ai_response)
# Use the structured data
except Exception as e:
# Log the error
print(f"AI output validation failed: {e}")
# Possible strategies:
# 1. Use a fallback approach
# 2. Re-prompt the AI with the error details
# 3. Apply some fixes and retry validation
3. Consider Re-Prompting When Validation Fails
One powerful approach is to tell the AI exactly what went wrong and ask it to fix the response:
def get_validated_response(prompt):
for attempt in range(3): # Try up to 3 times
response = ai_model.generate(prompt)
try:
result = ProductRecommendation.model_validate_json(response)
return result
except Exception as e:
if attempt < 2: # Don't update prompt on the last attempt
prompt += f"\nYour previous response had validation errors: {e}. Please fix them and try again."
# If we get here, all attempts failed
raise ValueError("Could not get valid response after multiple attempts")
This feedback loop helps the AI learn from its mistakes!
A Simple Tutorial: Getting Started with Pydantic and AI ποΈ
Let's bring everything together with a simple tutorial. We'll create a movie recommendation agent that returns properly structured data:
Step 1: Define Your Pydantic Model
from pydantic import BaseModel, Field
from typing import List, Optional
class MovieRecommendation(BaseModel):
title: str
year: int = Field(..., ge=1900, le=2030)
genres: List[str] = Field(..., min_items=1)
rating: float = Field(..., ge=0.0, le=10.0)
director: str
streaming_on: Optional[List[str]] = None
description: str = Field(..., max_length=500)
Step 2: Create a Function to Get Recommendations from an AI
def get_movie_recommendation(genre_preference, mood, decade_preference=None):
# Construct a prompt for the AI
prompt = f"""
Suggest a movie based on the following:
Genre preference: {genre_preference}
Mood: {mood}
Decade preference: {decade_preference or 'any'}
Return the recommendation as a JSON object with the following fields:
- title: the movie title
- year: the release year (1900-2030)
- genres: list of genres
- rating: rating out of 10
- director: the director's name
- streaming_on: list of streaming platforms (if known) or null
- description: brief description (max 500 chars)
"""
# In a real application, you'd call your AI model here
# For demonstration, let's pretend we got this response:
ai_response = """
{
"title": "The Grand Budapest Hotel",
"year": 2014,
"genres": ["Comedy", "Drama", "Adventure"],
"rating": 8.1,
"director": "Wes Anderson",
"streaming_on": ["HBO Max", "Disney+"],
"description": "A writer encounters the owner of an aging high-class hotel, who tells him of his early years serving as a lobby boy in the hotel's glorious years under an exceptional concierge."
}
"""
try:
# Parse and validate the AI response
recommendation = MovieRecommendation.model_validate_json(ai_response)
return recommendation
except Exception as e:
print(f"Error validating AI response: {e}")
# In a real application, you might implement retry logic here
return None
Step 3: Use the Recommendation in Your Application
def display_recommendation(recommendation):
if not recommendation:
return "Sorry, couldn't generate a valid recommendation."
return f"""
π¬ {recommendation.title} ({recommendation.year}) - {recommendation.rating}/10
Directed by: {recommendation.director}
Genres: {', '.join(recommendation.genres)}
{recommendation.description}
{f"Available on: {', '.join(recommendation.streaming_on)}" if recommendation.streaming_on else "Streaming info not available"}
"""
# Get and display a recommendation
user_genre = "sci-fi"
user_mood = "thoughtful"
user_decade = "2010s"
movie = get_movie_recommendation(user_genre, user_mood, user_decade)
print(display_recommendation(movie))
Advanced Tips and Tricks π§
Want to take your Pydantic + AI game to the next level? Here are some advanced techniques:
Custom Validators for Domain-Specific Rules
from pydantic import BaseModel, Field, validator
class TravelRecommendation(BaseModel):
destination: str
best_months: list[str]
budget_usd: int = Field(..., gt=0)
@validator('best_months')
def check_valid_months(cls, months):
valid_months = ["January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"]
for month in months:
if month not in valid_months:
raise ValueError(f"Invalid month: {month}")
return months
Nested Models for Complex Data
class Address(BaseModel):
street: str
city: str
state: str
country: str
postal_code: str
class Contact(BaseModel):
name: str
email: str
phone: Optional[str] = None
address: Address
class BusinessListing(BaseModel):
name: str
category: str
rating: float
contact: Contact
hours: dict[str, str]
Working with LangChain's Pydantic Output Parser
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from pydantic import BaseModel, Field
class AmazonProduct(BaseModel):
name: str = Field(description="The product name")
price: float = Field(description="The product price in USD")
rating: float = Field(description="Rating from 1-5")
reviews: int = Field(description="Number of reviews")
parser = PydanticOutputParser(pydantic_object=AmazonProduct)
prompt = PromptTemplate(
template="Extract product information from this text:\n{text}\n{format_instructions}",
input_variables=["text"],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
model = OpenAI()
input_text = """
This amazing laptop is the MacBook Pro 16-inch, priced at $2,399. It has received
excellent feedback from customers, with a 4.8 star rating based on 3,842 reviews.
"""
output = model(prompt.format(text=input_text))
product = parser.parse(output)
Conclusion π―
Pydantic is more than just a validation library - it's your AI agent's best friend! By defining clear data models and validating inputs and outputs, you can:
- Make your AI applications more reliable
- Catch errors early before they cascade into bigger problems
- Guide your models to produce better-structured outputs
- Create self-documenting code that clearly specifies what data you expect
The next time you're building an AI agent, take the time to define your data models with Pydantic. Your future self (and your users) will thank you!
Have you used Pydantic with AI projects? Feel free to share your experiences in the comments!
Top comments (0)