from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
agent = Agent('test', system_prompt='Python expert')
result = agent.run_sync('Is f-string faster than .format()?', model=TestModel())
print(result.output) # → success (no tool calls)
When I first saw this run without an API key, I was mildly surprised. Same feeling as when I first used FastAPI — the structure was so intuitive it almost made me suspicious. That's PydanticAI in a nutshell.
Honestly, my first impression was "isn't this just Instructor with a wrapper?" Using it changed my mind. It's a framework built around the type system — the same philosophy FastAPI brought to web APIs, now applied to AI agents. Here's what I actually found when I installed and ran it, failed tests included.
Why PydanticAI — A Different Angle from the Comparison Post
I wrote a Python AI Agent Library Comparison covering PydanticAI, Instructor, and Smolagents. That post answers "which one to pick." This one answers "how do you actually build with PydanticAI."
Quick breakdown of where each library sits:
| Library | Core role | Agent loop | Type safety |
|---|---|---|---|
| Instructor | LLM output parsing | None | Structured output only |
| PydanticAI | Agent framework | Full support | Input + output + tools |
| LangGraph | Orchestration | Graph-based | Weak |
| CrewAI | Multi-agent teams | Role-based | Weak |
That second row is what makes the real difference. Types are maintained throughout the entire loop — LLM calls a tool, gets results, processes them. Runtime errors surface as IDE errors during development.
GitHub stars: 16K+. Built by the Pydantic team, so maintenance concerns are minimal.
Installation and Prerequisites
pip install pydantic-ai
Latest version as of today (April 29, 2026) is 1.88.0. Clean install.
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install pydantic-ai
Provider-specific packages install on demand:
pip install pydantic-ai[anthropic] # For Claude
pip install pydantic-ai[openai] # For GPT-4o
pip install pydantic-ai[google] # For Gemini
Requirements: Python 3.9+, pydantic v2 (v1 not supported). TestModel works without any API key.
First Agent: TestModel for Structure Validation
My go-to first step when building a new agent. Verify the structure before connecting a real API.
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
agent = Agent(
'test',
system_prompt='You are a Python code reviewer. Be concise.',
)
result = agent.run_sync(
'Is f-string faster than .format()?',
model=TestModel() # No API key required
)
print(result.output) # → "success (no tool calls)"
print(result.usage()) # → RunUsage(input_tokens=64, output_tokens=4, requests=1)
TestModel makes no API calls. It's a test-only model for validating agent structure, tool wiring, and dependency injection. Use it in CI pipelines for zero-cost agent logic verification.
Switching to real Claude is a single-line change:
import os
os.environ['ANTHROPIC_API_KEY'] = 'sk-ant-...'
# Development: TestModel
result = agent.run_sync('Review this code', model=TestModel())
# Production: Real Claude
result = agent.run_sync('Review this code', model='anthropic:claude-sonnet-4-6')
Agent code stays identical. Only the model changes.
Structured Output: output_type Returns Pydantic Models
This is PydanticAI's core value proposition. Force the LLM to return a Pydantic model instance instead of free text.
Critical — v1.88.0 Breaking Change: The result_type parameter was renamed to output_type. Old docs or tutorials will give you:
pydantic_ai.exceptions.UserError: Unknown keyword arguments: `result_type`
I hit this directly. Confirmed via inspect.signature(Agent.__init__) that output_type is the correct name. Official docs still reference the old API in some places — watch out.
from pydantic import BaseModel, Field
from pydantic_ai import Agent
class CodeReview(BaseModel):
severity: str = Field(description="'low', 'medium', or 'high'")
issues: list[str] = Field(description="List of issues found")
suggestions: list[str] = Field(description="Improvement suggestions")
score: int = Field(ge=0, le=100, description="Code quality score 0-100")
review_agent = Agent(
'anthropic:claude-sonnet-4-6',
output_type=CodeReview, # ← v1.88.0: NOT result_type
system_prompt='Review Python code and provide structured feedback.',
)
result = review_agent.run_sync('''
def get_user(id):
db = connect()
return db.query(f"SELECT * FROM users WHERE id={id}")
''')
print(type(result.output)) # → <class '__main__.CodeReview'>
print(result.output.severity) # → 'high'
print(result.output.score) # → 25
print(result.output.issues[0]) # → 'SQL injection vulnerability'
The return value is a Pydantic model instance, not a dict or string. IDE autocomplete works on result.output.. The score field's 0-100 range is enforced by Pydantic automatically.
Automatic Retry Mechanism
When the LLM returns output that fails validation, PydanticAI feeds the ValidationError back to the LLM and requests a retry:
review_agent = Agent(
'anthropic:claude-sonnet-4-6',
output_type=CodeReview,
retries=3, # Retry up to 3 times on validation failure
output_retries=2 # Separate retry count for output validation
)
After 3 failures, UnexpectedModelBehavior is raised. In production, this handles models that occasionally return malformed output automatically.
@agent.tool and Dependency Injection — FastAPI's Depends() Pattern
from pydantic_ai import Agent, RunContext
class AppDeps:
def __init__(self, db_url: str, user_id: int):
self.db_url = db_url
self.user_id = user_id
agent = Agent(
'anthropic:claude-sonnet-4-6',
deps_type=AppDeps,
system_prompt='You are a task management agent.',
)
# Async tool for I/O-heavy operations
@agent.tool
async def get_pending_tasks(ctx: RunContext[AppDeps], limit: int = 5) -> list[dict]:
"""Get list of pending tasks"""
return [
{"id": f"task_{i}", "title": f"Task {i}", "priority": "high"}
for i in range(limit)
]
# Sync tool for fast computation
@agent.tool
def calculate_priority_score(
ctx: RunContext[AppDeps],
urgency: int,
importance: int
) -> float:
"""Calculate task priority score"""
return urgency * 0.6 + importance * 0.4
deps = AppDeps(db_url="postgresql://localhost/taskdb", user_id=42)
result = agent.run_sync("Pick the highest priority urgent task", deps=deps)
@agent.tool reads the function signature (parameter types, defaults) and docstring to auto-generate the JSON Schema it passes to the LLM. Write limit: int = 5 and the LLM knows the parameter accepts an integer with a default of 5.
Message flow is 4 stages, accessible via result.all_messages():
1. ModelRequest → system_prompt + user_prompt
2. ModelResponse → ToolCallPart(tool_name='get_pending_tasks', ...)
3. ModelRequest → ToolReturnPart(content=[{...}])
4. ModelResponse → Final answer
TestModel vs FunctionModel — Test Strategy
I found TestModel's critical limitation while testing in the sandbox. Worth documenting.
TestModel's Limitation
TestModel returns minimum values: 'a' for str, 0 for int. Fine for structure tests, but strict custom validators will fail:
from pydantic_ai.exceptions import UnexpectedModelBehavior
class UserProfile(BaseModel):
email: str
@field_validator('email')
@classmethod
def valid_email(cls, v):
if '@' not in v: # TestModel returns 'a' — always fails
raise ValueError('Email must contain @')
return v
agent = Agent('test', output_type=UserProfile, retries=3)
try:
result = agent.run_sync('...', model=TestModel())
except UnexpectedModelBehavior as e:
print(e) # → Exceeded maximum retries (3) for output validation
This isn't a bug. TestModel is for structure validation, not business logic validator testing.
FunctionModel for Precise Control
Use FunctionModel when you have validators or need to test tool response handling:
from pydantic_ai.models.function import FunctionModel
from pydantic_ai.messages import ModelMessage, ModelResponse, TextPart
from pydantic_ai.settings import ModelSettings
import json
def mock_valid_response(messages: list[ModelMessage], settings: ModelSettings) -> ModelResponse:
"""Provide exact response for testing"""
data = {"email": "test@example.com", "name": "Test User"}
return ModelResponse(parts=[TextPart(content=json.dumps(data))])
agent = Agent(FunctionModel(mock_valid_response), output_type=UserProfile)
result = agent.run_sync("...")
assert result.output.email == "test@example.com"
Complete Test Strategy
class TestMyAgent:
def test_structure(self):
"""Agent structure validation — TestModel"""
result = my_agent.run_sync("test", model=TestModel())
assert result is not None
def test_tool_called(self):
"""Tool invocation — TestModel + call_tools"""
result = my_agent.run_sync(
"Get data from DB",
deps=test_deps,
model=TestModel(call_tools=['query_database'])
)
assert 'query_database' in result.output
def test_response_processing(self):
"""Response logic — FunctionModel"""
def mock_fn(messages, settings):
return ModelResponse(parts=[TextPart(content='{"email": "t@t.com"}')])
result = my_agent.run_sync("...", model=FunctionModel(mock_fn))
assert result.output.email == "t@t.com"
Multi-Provider Switching
One of PydanticAI's tangible strengths: swap providers by changing a model string, nothing else.
review_agent = Agent(
system_prompt='Senior Python developer reviewing code.',
output_type=CodeReview,
)
# Same agent, different providers at runtime
result_claude = review_agent.run_sync(code, model='anthropic:claude-sonnet-4-6')
result_gpt = review_agent.run_sync(code, model='openai:gpt-4o')
result_gemini = review_agent.run_sync(code, model='google-gla:gemini-2.5-flash')
result_local = review_agent.run_sync(code, model='ollama:llama3.3')
result_groq = review_agent.run_sync(code, model='groq:llama-3.3-70b-versatile')
From a context engineering standpoint, the system_prompt and output_type schema are the core context — designing so the model is swappable above that layer is good architecture.
Cost comparison pattern:
import time
providers = {
'claude': 'anthropic:claude-sonnet-4-6',
'gpt4o': 'openai:gpt-4o',
'gemini': 'google-gla:gemini-2.5-flash',
}
for name, model in providers.items():
start = time.time()
result = review_agent.run_sync(code, model=model)
elapsed = time.time() - start
print(f"{name}: score={result.output.score}, latency={elapsed:.2f}s, "
f"tokens={result.usage().input_tokens + result.usage().output_tokens}")
Honest Assessment — What I Liked and What Fell Short
What worked well:
- Type safety makes a real difference in practice. Change the
output_typeschema and the IDE flags every related error immediately -
@agent.toolauto-generates JSON Schema from function signatures — no manual tool spec rewriting - TestModel + FunctionModel combination enables complete unit testing without any API calls
-
deps_typemakes dependency injection explicit, making mock swaps in tests clean
Where it falls short:
- Non-breaking changes like
result_type → output_typehappen frequently up through v1.88.0. The library isn't in a stable phase yet. I had to useinspect.signature(Agent.__init__)to verify the actual parameter name — that's a sign the docs lag behind the code - Streaming structured output is still beta. Parsing into a Pydantic model while the LLM generates partial output is tricky, and the current implementation is unstable
- Hard dependency on Pydantic v2. If you're on a v1 legacy codebase, migration cost is real
- Logfire integration (Pydantic team's monitoring service) is the official observability path but it's paid. Direct OpenTelemetry connection is possible but not officially documented well
Reading Production AI Agent Design Principles alongside this post clarifies what criteria matter most when choosing an agent framework.
Summary: Core Patterns at a Glance
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.test import TestModel
from pydantic_ai.models.function import FunctionModel
from pydantic_ai.messages import ModelMessage, ModelResponse, TextPart
from pydantic_ai.settings import ModelSettings
import json
# 1. Structured output model
class ReviewResult(BaseModel):
score: int = Field(ge=0, le=100)
verdict: str # 'approve', 'request_changes'
summary: str
# 2. Dependency type
class AgentDeps:
def __init__(self, repo_name: str, author: str):
self.repo_name = repo_name
self.author = author
# 3. Agent definition
agent = Agent(
system_prompt='Code review agent.',
output_type=ReviewResult, # v1.88.0: output_type (not result_type)
deps_type=AgentDeps,
retries=3,
)
# 4. Tool registration
@agent.tool
def get_context(ctx: RunContext[AgentDeps]) -> dict:
"""Get repository context"""
return {"repo": ctx.deps.repo_name, "author": ctx.deps.author}
# 5. Testing
## Structure test (no API)
result = agent.run_sync("Review this",
deps=AgentDeps("my-repo", "jangwook"),
model=TestModel())
## Response logic test (FunctionModel)
def mock(messages, settings):
return ModelResponse(parts=[TextPart(content=json.dumps({
"score": 85, "verdict": "approve", "summary": "Clean code"
}))])
result = agent.run_sync("Review this",
deps=AgentDeps("my-repo", "jangwook"),
model=FunctionModel(mock))
assert result.output.verdict == "approve"
# 6. Production: swap model only
result = agent.run_sync("Review this",
deps=AgentDeps("my-repo", "jangwook"),
model='anthropic:claude-sonnet-4-6')
Next Steps
For TypeScript stacks, Building a Claude Streaming Agent with Vercel AI SDK covers a similar approach.
If you're taking PydanticAI to production, recommended order:
- Define the return schema with
output_typefirst - Manage DB connections and HTTP clients as
deps_type - Add external API integrations via
@agent.tool - Test progressively: TestModel → FunctionModel → real model
- Configure retry strategy with
retries=3andoutput_retries=2 - Pin the version (
pydantic-ai==1.88.0). This library changes frequently
The PydanticAI GitHub repo updates fast. Reading the CHANGELOG before the official docs saves real debugging time. Speaking from experience — this library isn't at 1.0 yet, but for Python agent stacks, it's currently the most consistent type-safe option available.

Top comments (0)