DEV Community

Cover image for Supercharging Your AI Agents with Pydantic: A Developer's Guide πŸš€
Forrester Terry
Forrester Terry

Posted on

Supercharging Your AI Agents with Pydantic: A Developer's Guide πŸš€

Table of Contents

Introduction πŸ‘‹

Have you ever built an AI agent that sometimes returns unpredictable data structures? Or perhaps you've dealt with the frustration of parsing JSON from a language model only to have your application crash because a field was missing or had the wrong type?

I've been there too! That's why today I want to talk about one of my favorite tools for taming the wild outputs of AI agents: Pydantic!

In this guide, I'll show you how Pydantic can transform your AI agent development from a game of chance into a reliable, robust process. Let's dive in! πŸŠβ€β™‚οΈ

What is Pydantic and Why Should AI Developers Care? πŸ€”

Pydantic is a Python library for data validation that uses type annotations. Think of it as a bouncer for your data - it checks IDs (types) at the door and makes sure only the right data gets into your application.

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int
    is_active: bool
Enter fullscreen mode Exit fullscreen mode

But why is this particularly useful for AI agents? Here's the thing:

AI models (especially LLMs) are amazing at generating content, but they're not always precise about formatting. They might return JSON where a number is accidentally a string ("42" instead of 42), or they might forget a field entirely. Without validation, these small inconsistencies can cause big problems downstream in your application.

The AI Agent's Output Problem πŸ€–πŸ’¬

Imagine asking an AI agent to return information about a product:

{
  "name": "Super Widget",
  "price": "29.99",
  "in_stock": "true",
  "features": ["durable", "lightweight"]
}
Enter fullscreen mode Exit fullscreen mode

Notice the issues? price is a string instead of a float, and in_stock is a string instead of a boolean. Your application might crash when it tries to do math with that price or make a decision based on the stock status.

Pydantic to the rescue! It can automatically convert these types for you, so "29.99" becomes 29.99 (float) and "true" becomes True (boolean).

How Pydantic Makes AI Agents More Reliable πŸ›‘οΈ

1. Automatic Type Conversion

Pydantic doesn't just validate - it tries to convert data to the right type when possible. This is perfect for AI outputs that are almost correct.

from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool
    features: list[str]

# Even though price and in_stock are strings, Pydantic will convert them
product = Product(
    name="Super Widget", 
    price="29.99", 
    in_stock="true",
    features=["durable", "lightweight"]
)

print(product)
# Product(name='Super Widget', price=29.99, in_stock=True, features=['durable', 'lightweight'])
Enter fullscreen mode Exit fullscreen mode

2. Clear Error Messages

When validation fails, Pydantic tells you exactly what went wrong:

try:
    Product(name="Broken Widget", price="expensive", in_stock=True, features="strong")
except Exception as e:
    print(f"Error: {e}")

# Error: 2 validation errors for Product
# price
#   Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='expensive', input_type=str]
# features
#   Input should be a valid list [type=list_type, input_value='strong', input_type=str]
Enter fullscreen mode Exit fullscreen mode

These detailed errors make debugging so much easier when your AI agent returns unexpected data.

3. Schema Generation for Guiding AI Outputs

One of my favorite Pydantic features for AI development is its ability to generate JSON schemas:

print(Product.model_json_schema())
# {
#   "title": "Product",
#   "type": "object",
#   "properties": {
#     "name": {"title": "Name", "type": "string"},
#     "price": {"title": "Price", "type": "number"},
#     "in_stock": {"title": "In Stock", "type": "boolean"},
#     "features": {
#       "title": "Features",
#       "type": "array",
#       "items": {"type": "string"}
#     }
#   },
#   "required": ["name", "price", "in_stock", "features"]
# }
Enter fullscreen mode Exit fullscreen mode

You can use this schema to guide your AI model, especially if you're using function calling with OpenAI or similar features with other providers. This dramatically improves the chances of getting correctly formatted responses!

Real-World Use Cases 🌍

Let's look at how developers are using Pydantic with AI agents in the wild:

OpenAI Function Calling + Pydantic

from openai import OpenAI
from pydantic import BaseModel, Field

class WeatherInfo(BaseModel):
    location: str = Field(..., description="The city and state")
    temperature: float = Field(..., description="Current temperature in Celsius")
    condition: str = Field(..., description="Weather condition (sunny, cloudy, etc.)")

# Get the JSON schema for OpenAI
function_def = {
    "name": "get_weather",
    "description": "Get the current weather in a location",
    "parameters": WeatherInfo.model_json_schema()
}

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "user", "content": "What's the weather like in Boston?"}
    ],
    functions=[function_def],
    function_call={"name": "get_weather"}
)

# Extract and validate the response
function_response = response.choices[0].message.function_call.arguments
weather = WeatherInfo.model_validate_json(function_response)
print(f"It's {weather.temperature}Β°C and {weather.condition} in {weather.location}")
Enter fullscreen mode Exit fullscreen mode

This approach has two benefits:

  1. The schema guides the AI to produce properly structured output
  2. Pydantic validates that output as an extra safety measure

Multi-Agent Systems (CrewAI)

In systems where multiple AI agents collaborate, consistent data structures are critical. Frameworks like CrewAI use Pydantic to ensure agents communicate properly:

from pydantic import BaseModel
from crewai import Agent, Task, Crew

class ResearchReport(BaseModel):
    topic: str
    key_findings: list[str]
    sources: list[str]

# Define a task with a Pydantic output schema
research_task = Task(
    description="Research the latest advancements in quantum computing",
    expected_output="A structured research report with key findings and sources",
    output_pydantic=ResearchReport
)

# When the agent completes the task, CrewAI validates the output against the model
Enter fullscreen mode Exit fullscreen mode

Best Practices for Using Pydantic with AI Agents βœ…

After working with numerous AI projects, here are my top recommendations:

1. Start with a Clear Data Model

Define Pydantic models that capture exactly what you need from your AI agent. Be specific about types and constraints:

from typing import Optional, Literal
from pydantic import BaseModel, Field, conint

class ProductRecommendation(BaseModel):
    product_name: str
    price_range: str = Field(..., pattern=r"^\$\d+-\$\d+$")  # Ensure format like "$10-$20"
    rating: conint(ge=1, le=5)  # Integer between 1-5
    category: Literal["electronics", "clothing", "home", "books", "other"]
    features: list[str] = Field(..., min_items=2, max_items=5)
    in_stock: bool
    shipping_days: Optional[int] = None
Enter fullscreen mode Exit fullscreen mode

This detailed model serves as documentation and validation in one package.

2. Handle Validation Errors Gracefully

Always wrap your Pydantic validation in try/except blocks:

try:
    recommendation = ProductRecommendation.model_validate_json(ai_response)
    # Use the structured data
except Exception as e:
    # Log the error
    print(f"AI output validation failed: {e}")

    # Possible strategies:
    # 1. Use a fallback approach
    # 2. Re-prompt the AI with the error details
    # 3. Apply some fixes and retry validation
Enter fullscreen mode Exit fullscreen mode

3. Consider Re-Prompting When Validation Fails

One powerful approach is to tell the AI exactly what went wrong and ask it to fix the response:

def get_validated_response(prompt):
    for attempt in range(3):  # Try up to 3 times
        response = ai_model.generate(prompt)

        try:
            result = ProductRecommendation.model_validate_json(response)
            return result
        except Exception as e:
            if attempt < 2:  # Don't update prompt on the last attempt
                prompt += f"\nYour previous response had validation errors: {e}. Please fix them and try again."

    # If we get here, all attempts failed
    raise ValueError("Could not get valid response after multiple attempts")
Enter fullscreen mode Exit fullscreen mode

This feedback loop helps the AI learn from its mistakes!

A Simple Tutorial: Getting Started with Pydantic and AI πŸ—οΈ

Let's bring everything together with a simple tutorial. We'll create a movie recommendation agent that returns properly structured data:

Step 1: Define Your Pydantic Model

from pydantic import BaseModel, Field
from typing import List, Optional

class MovieRecommendation(BaseModel):
    title: str
    year: int = Field(..., ge=1900, le=2030)
    genres: List[str] = Field(..., min_items=1)
    rating: float = Field(..., ge=0.0, le=10.0)
    director: str
    streaming_on: Optional[List[str]] = None
    description: str = Field(..., max_length=500)
Enter fullscreen mode Exit fullscreen mode

Step 2: Create a Function to Get Recommendations from an AI

def get_movie_recommendation(genre_preference, mood, decade_preference=None):
    # Construct a prompt for the AI
    prompt = f"""
    Suggest a movie based on the following:
    Genre preference: {genre_preference}
    Mood: {mood}
    Decade preference: {decade_preference or 'any'}

    Return the recommendation as a JSON object with the following fields:
    - title: the movie title
    - year: the release year (1900-2030)
    - genres: list of genres
    - rating: rating out of 10
    - director: the director's name
    - streaming_on: list of streaming platforms (if known) or null
    - description: brief description (max 500 chars)
    """

    # In a real application, you'd call your AI model here
    # For demonstration, let's pretend we got this response:
    ai_response = """
    {
      "title": "The Grand Budapest Hotel",
      "year": 2014,
      "genres": ["Comedy", "Drama", "Adventure"],
      "rating": 8.1,
      "director": "Wes Anderson",
      "streaming_on": ["HBO Max", "Disney+"],
      "description": "A writer encounters the owner of an aging high-class hotel, who tells him of his early years serving as a lobby boy in the hotel's glorious years under an exceptional concierge."
    }
    """

    try:
        # Parse and validate the AI response
        recommendation = MovieRecommendation.model_validate_json(ai_response)
        return recommendation
    except Exception as e:
        print(f"Error validating AI response: {e}")
        # In a real application, you might implement retry logic here
        return None
Enter fullscreen mode Exit fullscreen mode

Step 3: Use the Recommendation in Your Application

def display_recommendation(recommendation):
    if not recommendation:
        return "Sorry, couldn't generate a valid recommendation."

    return f"""
    🎬 {recommendation.title} ({recommendation.year}) - {recommendation.rating}/10

    Directed by: {recommendation.director}
    Genres: {', '.join(recommendation.genres)}

    {recommendation.description}

    {f"Available on: {', '.join(recommendation.streaming_on)}" if recommendation.streaming_on else "Streaming info not available"}
    """

# Get and display a recommendation
user_genre = "sci-fi"
user_mood = "thoughtful"
user_decade = "2010s"

movie = get_movie_recommendation(user_genre, user_mood, user_decade)
print(display_recommendation(movie))
Enter fullscreen mode Exit fullscreen mode

Advanced Tips and Tricks 🧠

Want to take your Pydantic + AI game to the next level? Here are some advanced techniques:

Custom Validators for Domain-Specific Rules

from pydantic import BaseModel, Field, validator

class TravelRecommendation(BaseModel):
    destination: str
    best_months: list[str]
    budget_usd: int = Field(..., gt=0)

    @validator('best_months')
    def check_valid_months(cls, months):
        valid_months = ["January", "February", "March", "April", "May", "June", 
                      "July", "August", "September", "October", "November", "December"]

        for month in months:
            if month not in valid_months:
                raise ValueError(f"Invalid month: {month}")
        return months
Enter fullscreen mode Exit fullscreen mode

Nested Models for Complex Data

class Address(BaseModel):
    street: str
    city: str
    state: str
    country: str
    postal_code: str

class Contact(BaseModel):
    name: str
    email: str
    phone: Optional[str] = None
    address: Address

class BusinessListing(BaseModel):
    name: str
    category: str
    rating: float
    contact: Contact
    hours: dict[str, str]
Enter fullscreen mode Exit fullscreen mode

Working with LangChain's Pydantic Output Parser

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from pydantic import BaseModel, Field

class AmazonProduct(BaseModel):
    name: str = Field(description="The product name")
    price: float = Field(description="The product price in USD")
    rating: float = Field(description="Rating from 1-5")
    reviews: int = Field(description="Number of reviews")

parser = PydanticOutputParser(pydantic_object=AmazonProduct)

prompt = PromptTemplate(
    template="Extract product information from this text:\n{text}\n{format_instructions}",
    input_variables=["text"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

model = OpenAI()

input_text = """
This amazing laptop is the MacBook Pro 16-inch, priced at $2,399. It has received 
excellent feedback from customers, with a 4.8 star rating based on 3,842 reviews.
"""

output = model(prompt.format(text=input_text))
product = parser.parse(output)
Enter fullscreen mode Exit fullscreen mode

Conclusion 🎯

Pydantic is more than just a validation library - it's your AI agent's best friend! By defining clear data models and validating inputs and outputs, you can:

  • Make your AI applications more reliable
  • Catch errors early before they cascade into bigger problems
  • Guide your models to produce better-structured outputs
  • Create self-documenting code that clearly specifies what data you expect

The next time you're building an AI agent, take the time to define your data models with Pydantic. Your future self (and your users) will thank you!

Have you used Pydantic with AI projects? Feel free to share your experiences in the comments!

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry πŸ•’

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more β†’

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more