Jahanzaib

Posted on Apr 6 • Originally published at jahanzaib.ai

Pydantic AI Tutorial: How I Build Type-Safe AI Agents That Actually Work in Production

#pydanticai #python #aiagents #awsbedrock

The fourth time I had to debug a LangChain agent that silently returned malformed JSON and crashed a client's order processing pipeline, I decided I was done patching type errors at midnight. That was eight months ago. Since then I've built 14 production systems on Pydantic AI, and not one of them has broken in the same way.

Pydantic AI is a Python agent framework built by the Pydantic team — the same people behind the library that OpenAI, Google, and Anthropic use for data validation inside their own SDKs. It launched in late 2024, hit 16,000 GitHub stars by early 2026, and releases new versions almost weekly. The core idea is simple: if you're going to build agents that run real business logic, they need the same type safety and validation guarantees you'd expect from any other production Python code.

This isn't a beginner's hello-world guide. I'm going to walk you through the patterns I actually use across client deployments — structured outputs, dependency injection, async agents, Bedrock integration, and how to test all of it without burning through API credits. I'll also tell you exactly when I reach for LangGraph instead, because the two aren't competitors so much as complements.

Key Takeaways

Pydantic AI brings FastAPI-style type safety to AI agent development: your agent's output is a validated Pydantic model, not a string you hope parses correctly
Dependency injection lets you pass database connections, API clients, and user context into tools without global state or environment variable hacks
The @agent.tool decorator is all you need for function calling — Pydantic validates arguments automatically before your tool code even runs
Async support is first-class: agent.run() is async and handles concurrent requests without blocking your event loop
Pydantic AI works natively with AWS Bedrock via BedrockConverseModel — though structured streaming has a known limitation with Claude models (data arrives as one chunk)
Use Pydantic AI when you need type-safe single agents or small agent graphs. Add LangGraph when you need complex conditional branching, checkpointing, or human-in-the-loop across many steps

What Is Pydantic AI and Why I Started Using It

Most agent frameworks treat the LLM's output as a string you then parse. You prompt engineer your way to something that looks like JSON, then write a parser, then add error handling for when the JSON is broken, then add retry logic for when the retry produces equally broken JSON. I've done this. It's terrible.

Pydantic AI flips this. You define what you want the agent to return — a Pydantic model, a list, a typed dict, even a primitive — and the framework handles the validation loop automatically. If the model returns something that doesn't match your schema, Pydantic AI sends the validation error back to the LLM as feedback and retries. You get a validated result or an exception. No silent failures.

The second reason I use it is the developer experience. Because everything is typed, your IDE gives you autocomplete on result.data, catches type errors before runtime, and makes refactoring safe. After spending too much time hunting down attribute access bugs in dynamically typed agent chains, this matters more than any benchmark number.

Type-annotated Python code is the foundation of Pydantic AI's safety model

Installing Pydantic AI and Building Your First Agent

Installation is a single command:

pip install pydantic-ai

For AWS Bedrock specifically, you need the extras:

pip install "pydantic-ai[bedrock]"

A minimal agent looks like this:

from pydantic_ai import Agent

agent = Agent(
    'anthropic:claude-haiku-4-5-20251001',
    instructions='You are a concise assistant. Answer in 2 sentences max.'
)

result = agent.run_sync('What is dependency injection?')
print(result.data)
# Dependency injection is a pattern where an object receives its dependencies 
# from outside rather than creating them itself.

That's it. The result.data field holds the output. For a plain string output type (the default), it's just a string. But the real power comes when you tell the agent what shape you want back.

Structured Output: The Feature That Changes Everything

Here's where Pydantic AI earns its name. Instead of parsing strings, you define a Pydantic model and pass it as result_type:

from pydantic import BaseModel
from pydantic_ai import Agent

class CompetitorAnalysis(BaseModel):
    company_name: str
    main_strength: str
    main_weakness: str
    pricing_tier: str  # 'budget', 'mid', 'enterprise'
    verdict: str

agent = Agent(
    'anthropic:claude-haiku-4-5-20251001',
    result_type=CompetitorAnalysis,
    instructions='Analyze the company described. Be specific and honest.'
)

result = agent.run_sync(
    'Analyze Zapier as a competitor to a custom n8n deployment for enterprise clients.'
)

analysis = result.data  # type: CompetitorAnalysis
print(analysis.pricing_tier)  # 'enterprise'
print(analysis.verdict)       # Full string, validated

I used this exact pattern for a B2B SaaS client building a competitive intelligence tool. Their previous implementation used GPT-4 with a JSON prompt and a custom parser. It worked about 80% of the time. With Pydantic AI it works 100% of the time or raises a clear exception you can handle explicitly.

The validation loop is automatic. If Claude returns pricing_tier: "mid-market" instead of one of your allowed values, Pydantic raises a ValidationError, Pydantic AI sends that error message back to the LLM as a correction prompt, and the LLM tries again. You can configure retries on the agent to control how many times this happens before raising to the caller.

Complex Nested Models

You're not limited to flat models. Nested structures work exactly as you'd expect:

from typing import List
from pydantic import BaseModel

class Action(BaseModel):
    step: int
    description: str
    owner: str  # 'human', 'agent', 'system'
    estimated_minutes: int

class ProjectPlan(BaseModel):
    title: str
    total_hours: float
    risk_level: str  # 'low', 'medium', 'high'
    actions: List[Action]
    blockers: List[str]

I've deployed this kind of multi-level output for project scoping agents where clients input a brief description of what they want to build and the agent returns a structured work breakdown. The type safety means the downstream code that reads plan.actions never has to guess whether it's a list or a string.

Dependency Injection: Production-Grade Context Passing

This is the feature most tutorials gloss over, and it's the one that makes Pydantic AI actually usable in real systems. The problem it solves: your tools need context. They need database connections, API clients, the current user's ID, rate limiter instances. The wrong way to handle this is global variables or environment lookups inside tool functions. The right way is dependency injection.

Pydantic AI's DI system uses a Deps dataclass (any Python dataclass or TypedDict) that you pass into agent.run(). Tools receive it via RunContext:

import asyncio
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext

@dataclass
class Deps:
    user_id: str
    db_client: object  # your actual DB client
    api_key: str

agent = Agent(
    'anthropic:claude-haiku-4-5-20251001',
    deps_type=Deps,
    instructions='Help users look up their account information.'
)

@agent.tool
async def get_account_balance(ctx: RunContext[Deps], account_type: str) -> dict:
    """Get the account balance for a specific account type."""
    balance = await ctx.deps.db_client.query(
        'SELECT balance FROM accounts WHERE user_id = ? AND type = ?',
        ctx.deps.user_id,
        account_type
    )
    return {'balance': balance, 'currency': 'USD'}

async def main():
    deps = Deps(
        user_id='usr_abc123',
        db_client=your_db_client,
        api_key='sk-...'
    )
    result = await agent.run(
        'What is my checking account balance?',
        deps=deps
    )
    print(result.data)

Dependency injection keeps database connections and API clients out of global state

The reason this matters in production: you can create Deps from your request context. User ID from JWT. DB connection from your connection pool. Rate limiter for that specific user. The agent and its tools get exactly what they need with no global state, no threading issues, and no test pollution. In unit tests, you swap in mock clients without monkey patching anything.

Real Client Example: CRM Enrichment Agent

One of my real estate clients needed an agent that looks up a lead in their CRM, enriches it with publicly available property data, and writes a personalized follow-up draft. The Deps object carries the CRM client, the property data API client, and the user's email for tone calibration:

@dataclass
class CRMDeps:
    crm: CRMClient
    property_api: PropertyAPIClient
    agent_email: str
    agent_name: str

@crm_agent.tool
async def lookup_lead(ctx: RunContext[CRMDeps], lead_id: str) -> dict:
    """Look up lead details from the CRM."""
    lead = await ctx.deps.crm.get_lead(lead_id)
    return {
        'name': lead.full_name,
        'property_interest': lead.property_type,
        'budget': lead.budget_range,
        'last_contact': lead.last_contact_date
    }

@crm_agent.tool
async def get_market_data(
    ctx: RunContext[CRMDeps],
    zip_code: str,
    property_type: str
) -> dict:
    """Get current market data for a specific location and property type."""
    data = await ctx.deps.property_api.market_summary(zip_code, property_type)
    return {
        'median_price': data.median,
        'days_on_market': data.avg_dom,
        'inventory': data.active_listings
    }

This agent runs on every new CRM lead. The output is a FollowUpDraft Pydantic model with subject, body, and recommended_call_time. Zero global state, fully testable, and the type system means nobody on the team can accidentally pass the wrong client type.

Tool Definition: Giving Your Agent Real Capabilities

The @agent.tool decorator turns a regular Python function into an LLM-callable tool. Pydantic validates the arguments the LLM passes before your function code ever runs. This is huge — it means you don't need to write argument validation inside your tools.

@agent.tool
async def search_knowledge_base(
    ctx: RunContext[Deps],
    query: str,
    max_results: int = 5,
    category: str | None = None
) -> list[dict]:
    """
    Search the internal knowledge base for relevant articles.

    Args:
        query: The search query string
        max_results: Maximum number of results to return (1-20)
        category: Optional category filter ('support', 'billing', 'technical')
    """
    results = await ctx.deps.search_client.query(
        query,
        limit=min(max_results, 20),  # still good to cap on our side
        category=category
    )
    return [{'title': r.title, 'excerpt': r.excerpt, 'url': r.url} for r in results]

The docstring is automatically extracted and sent to the LLM as the tool description. The parameter docstring descriptions become the JSON schema descriptions. This means your documentation and your tool contract are the same thing — change the docstring, the LLM's understanding changes too.

Error Handling Inside Tools

When a tool raises an exception, Pydantic AI catches it and passes the error message back to the LLM as tool output. The agent can then try a different approach or explain the error to the user. You can also use ModelRetry to signal the LLM should retry with different parameters:

from pydantic_ai import ModelRetry

@agent.tool
async def get_weather(ctx: RunContext[Deps], city: str) -> dict:
    """Get current weather for a city."""
    try:
        data = await ctx.deps.weather_api.current(city)
        return {'temp_c': data.temperature, 'conditions': data.conditions}
    except CityNotFoundError:
        raise ModelRetry(f"City '{city}' not found. Try a more specific name or add the country code.")
    except RateLimitError:
        # Don't retry on rate limits — surface the real error
        raise ValueError("Weather API rate limit reached. Please wait before requesting more data.")

Async Agents: Handling Production Load

In production, you're almost never running a single agent synchronously. You're handling concurrent user requests, batch processing, or parallel sub-agent calls. Pydantic AI is async-first — agent.run() returns a coroutine, and agent.run_sync() is just a thin wrapper for scripts and REPL use.

import asyncio
from pydantic_ai import Agent

async def process_leads(lead_ids: list[str], deps: CRMDeps) -> list[FollowUpDraft]:
    """Process multiple leads concurrently."""
    tasks = [
        crm_agent.run(
            f'Generate a follow-up for lead {lead_id}',
            deps=deps
        )
        for lead_id in lead_ids
    ]

    results = await asyncio.gather(*tasks, return_exceptions=True)

    drafts = []
    for lead_id, result in zip(lead_ids, results):
        if isinstance(result, Exception):
            print(f"Lead {lead_id} failed: {result}")
            continue
        drafts.append(result.data)

    return drafts

For a logistics client I work with, this pattern processes 40 to 60 inbound shipment notifications concurrently. Each agent call checks carrier APIs, validates delivery windows, and generates exception reports. The whole batch runs in under 8 seconds because the I/O waits overlap instead of stacking.

Message History and Multi-Turn Conversations

If you need conversational context across turns — a support chat, an interview flow, a wizard-style form — you pass the previous run's messages into the next call:

async def chat_session(agent: Agent, deps: Deps):
    messages = []

    while True:
        user_input = input("You: ")
        if user_input.lower() == 'quit':
            break

        result = await agent.run(
            user_input,
            message_history=messages,
            deps=deps
        )

        # Extend message history with this turn
        messages = result.all_messages()
        print(f"Agent: {result.data}")

The result.all_messages() call returns the full conversation including tool calls and results, formatted correctly for the next run. No manual message formatting needed.

Pydantic AI with AWS Bedrock

Most tutorials use OpenAI or Google Gemini. My production deployments almost all run on AWS Bedrock because my clients are already in AWS and the spend goes against existing Enterprise Discount Program commitments. Pydantic AI's Bedrock support works well, with one important gotcha.

AWS Bedrock handles IAM, region routing, and enterprise compliance automatically

Setting Up BedrockConverseModel

from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel
from pydantic_ai.providers.bedrock import BedrockProvider

# Option 1: String shorthand (uses default credentials)
agent = Agent('bedrock:anthropic.claude-haiku-4-5-20251001')

# Option 2: Explicit model with region (my standard setup)
model = BedrockConverseModel(
    'anthropic.claude-haiku-4-5-20251001',
    provider=BedrockProvider(region_name='us-east-1')
)
agent = Agent(model, instructions='...')

Credentials come from the standard boto3 chain: environment variables, instance profile, assumed role. In Lambda or ECS, this just works. Locally you need AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION in your environment.

The Streaming Gotcha

Structured streaming (getting partial validated output as the model generates) does not work with Claude models via Bedrock. When you use agent.run_stream() with a Bedrock Claude model and a structured result type, the data still arrives as a single chunk at the end rather than progressively. For text output (no result_type), streaming works fine.

In practice this hasn't blocked any of my deployments. Structured output tasks are usually fast enough that streaming the structure itself isn't useful — you want the complete validated object, not partial JSON. For cases where I need real-time feedback, I separate the streaming UI from the structured processing step.

Cross-Region Inference

If you want to use cross-region inference profiles (for higher throughput limits), just use the inference profile ARN as the model name:

model = BedrockConverseModel(
    'us.anthropic.claude-haiku-4-5-20251001',  # US cross-region profile
    provider=BedrockProvider(region_name='us-east-1')
)

Pydantic AI vs LangGraph: When to Use Which

I use both. The choice isn't about which is better — it's about what your workflow actually needs.

Need	Use Pydantic AI	Use LangGraph
Single agent with tools	Yes	Overkill
Structured validated output	Yes, native	Needs extra wiring
Dependency injection	First-class	Via LangChain context
Complex branching logic	Gets messy	Yes, this is its strength
Checkpoint and resume	No	Yes (core feature)
Human-in-the-loop approval	Basic support	Robust `interrupt_before`
Multi-agent orchestration at scale	Use as node inside LangGraph	Yes
Type safety throughout	Strong	Moderate
Learning curve	Low (like FastAPI)	Medium (graph concepts)

The pattern I reach for most: use Pydantic AI agents as the "worker" nodes inside a LangGraph graph. LangGraph handles the orchestration, routing, and state persistence. Each node calls a Pydantic AI agent with typed inputs and outputs. You get the best of both: LangGraph's workflow control with Pydantic AI's type safety at the task level.

Testing Pydantic AI Agents Without Burning API Credits

This section doesn't exist in most tutorials. Testing agents is different from testing regular functions because the LLM response is non-deterministic. Pydantic AI has a built-in solution: FunctionModel.

FunctionModel lets you replace the real LLM with a function that returns whatever you want. Your tests run instantly, cost nothing, and are deterministic:

from pydantic_ai.models.function import FunctionModel, ModelContext
from pydantic_ai.messages import ModelResponse, TextPart, ToolCallPart
import json

def mock_model(messages: list, info: ModelContext) -> ModelResponse:
    """Return a mock response that calls the search tool."""
    # Simulate the LLM deciding to call a tool
    return ModelResponse(parts=[
        ToolCallPart(
            tool_name='search_knowledge_base',
            args=json.dumps({'query': 'refund policy', 'max_results': 3})
        )
    ])

def test_support_agent_calls_search():
    test_agent = support_agent.override(model=FunctionModel(mock_model))

    deps = SupportDeps(
        search_client=MockSearchClient(),
        user_id='test_user'
    )

    result = test_agent.run_sync('What is your refund policy?', deps=deps)

    # Verify the tool was called and output was validated
    assert result.data is not None
    assert isinstance(result.data, SupportResponse)

FunctionModel makes agent testing deterministic and free — no API calls required

For integration tests where you want to hit the real model but verify behavior at a higher level, Pydantic AI's built-in evaluation tools let you run test cases against your agent and check outputs against assertions. The official docs have examples of this under "Evals".

Cost Optimization in Production

Three things eat your token budget with Pydantic AI agents:

1. Retry loops. Every validation failure triggers a retry. If your result type is too strict or your prompt is ambiguous, you can end up paying for 3 to 5 model calls per request. Track your result.usage() across a sample of real calls. Anything averaging over 1.2 calls is a warning sign that your schema or prompt needs work.

result = await agent.run(user_message, deps=deps)
usage = result.usage()
print(f"Requests: {usage.requests}, Input tokens: {usage.input_tokens}")

2. Tool descriptions. The docstring of every tool goes into every system prompt. If you have 12 tools each with 200-word docstrings, you're paying for 2,400 tokens of tool descriptions on every single call. Be ruthless: keep docstrings under 50 words and use the parameter descriptions only for non-obvious fields.

3. Message history growth. If you're passing multi-turn history, tokens grow linearly with conversation length. For most business workflows, conversations beyond 10 turns are rare. Add a hard limit or summarization step at the 8-turn mark.

Three Real Client Deployments

1. Legal Document Classifier (Legal Tech Client)

A legal tech startup needed to classify inbound contract documents by type, jurisdiction, and risk flags before routing to the right attorney. Previous approach: keyword matching with 200 rules. Accuracy: 67%. My implementation: a Pydantic AI agent with a DocumentClassification result type (type enum, jurisdiction string, risk_flags list, confidence float). Running on Bedrock Claude Haiku 4.5. Accuracy: 94%. Processing time: under 2 seconds per document.

The key was the confidence field. When the agent returns confidence < 0.8, the document goes to a human review queue instead of auto-routing. Before Pydantic AI, getting a reliable confidence score out of an LLM took 3 layers of prompt engineering. With structured output it's just a float field in the model.

2. E-Commerce Customer Support Triage (DTC Brand)

An e-commerce client had 800 to 1,200 support tickets per day. They wanted to auto-resolve the 40% of tickets that were standard order status inquiries. I built a Pydantic AI agent with tools for order lookup, shipping carrier API calls, and CRM history. The TriageResult model includes action (auto-resolve, escalate, or needs-info), response_draft, confidence, and escalation_reason.

The dependency injection pattern meant the agent gets the customer's order history and past tickets injected from the request context. No separate retrieval step. The agent resolves 38% of tickets automatically (slightly under target due to some edge cases) with a 96% customer satisfaction rate on auto-resolved tickets. Their support team handles 750 fewer tickets per day.

3. CRM Lead Scoring (Real Estate Agency)

A real estate agency wanted AI-powered lead scoring that integrates with their custom CRM. The agent takes a lead profile, calls property interest lookup and local market data tools, and returns a LeadScore object with a numeric score (0 to 100), a tier (hot, warm, cold), a one-paragraph reasoning, and a list of recommended next actions. The scoring runs automatically on every new lead and on weekly rescores of the existing pipeline. The injection of the agent's own contact info into Deps means the same agent code generates recommendations personalized to different agents on their team.

Common Mistakes I See

Using run_sync() everywhere. It's fine for scripts. In a FastAPI app or Lambda handler, you want await agent.run(). The sync wrapper blocks your event loop.

Putting business logic inside result validators. Pydantic validators in your result type run on every retry. If a validator makes a database call, it runs 3 times on a failed validation. Put database calls in tools, not validators.

Over-specifying the system prompt. LLMs are good at inference. You don't need to explain JSON format, tell the model not to apologize, or add 500 words of rules. Your result_type specification already constrains the output format. Trust the validation loop.

Not setting a retries limit. The default retry count is generous. In production, set retries=2 on your agent and handle UnexpectedModelBehavior explicitly instead of letting the framework burn tokens on an agent that consistently can't satisfy your schema.

Citation Capsule: Pydantic AI hit 16,000 GitHub stars by April 2026 with the parent Pydantic library surpassing 10 billion downloads across all Python projects. According to the Pydantic AI GitHub repository, the latest release came on April 3, 2026. Amazon Web Services supports Pydantic AI in its Bedrock AgentCore documentation. For the parent Pydantic download milestone, see Pydantic's official blog post.

Frequently Asked Questions

Is Pydantic AI production-ready in 2026?

Yes. The framework has been in active production use since early 2025, reached its 1.x stable API in late 2025, and as of April 2026 is used by companies including those building on Amazon Bedrock AgentCore. The weekly release cadence means bugs get fixed fast, but the stable API means your code doesn't break between updates. I've been running it in production for 8 months without a breaking change.

Does Pydantic AI work with AWS Bedrock?

Yes, via the BedrockConverseModel class. Install with pip install "pydantic-ai[bedrock]", then initialize your agent with 'bedrock:anthropic.claude-haiku-4-5-20251001' or an explicit BedrockConverseModel instance. Credentials come from the standard boto3 chain. One caveat: structured output streaming does not work with Claude models on Bedrock — data arrives as a single chunk rather than progressively.

What is the difference between Pydantic AI and LangChain?

LangChain is a broad ecosystem covering everything from prompt templates to vector store integrations to agent frameworks. Pydantic AI is narrowly focused on one thing: type-safe agents with validated outputs. Pydantic AI has less surface area, a cleaner API, and first-class type checking. LangChain has more integrations and a larger community. For new projects I start with Pydantic AI and add LangChain integrations only when I need something specific it provides.

How does Pydantic AI handle tool errors?

Tool exceptions are caught automatically and passed back to the LLM as tool output, giving the model a chance to recover or try a different approach. You can also raise ModelRetry from inside a tool to explicitly signal the LLM should try different parameters. For errors you don't want the LLM to retry — like rate limit errors — raise a standard exception and it bubbles up to your caller.

Can Pydantic AI agents call other Pydantic AI agents?

Yes. You can call an agent from inside another agent's tool function, or use Pydantic AI agents as node functions inside a LangGraph graph. The nested agent gets its own dependency context. This pattern works well for orchestrator-worker setups where a top-level agent decides what sub-task to delegate and sub-agents handle the specialized work.

How do I test Pydantic AI agents without making real API calls?

Use FunctionModel from pydantic_ai.models.function. It lets you replace the LLM with a Python function that returns a ModelResponse. Your tests run instantly, are deterministic, and cost nothing. For tool-specific tests, mock the dependency objects in your Deps dataclass. The official docs also include an eval framework for higher-level behavioral testing against real models.

What models does Pydantic AI support?

Pydantic AI supports 20+ providers including OpenAI, Anthropic (direct and via Bedrock), Google Gemini, Cohere, Mistral, Groq, and others. You can also implement a custom model class for any provider with an HTTP API. The model-agnostic design means you can switch providers in one line without changing any agent or tool code.

Is Pydantic AI free to use?

Yes. Pydantic AI is open source (MIT license) and free to use. You pay only for the underlying LLM API calls — Bedrock, OpenAI, Anthropic, or whatever provider you configure. There is no hosted version or subscription fee from the Pydantic team. The optional Pydantic Logfire integration for observability has a free tier and paid plans.

If you want to see how I decide between Pydantic AI and full agent orchestration with state management, read my LangGraph tutorial. For understanding when you even need agents versus simpler automation, the agents vs automation guide covers my full decision framework. If you are ready to build something and want a second opinion on your architecture, the AI readiness assessment is a good starting point, or just reach out directly.

DEV Community