Tang Weigang

Posted on Jun 27

Do Not Treat Pydantic AI as an Agent Magic Layer

#aiagentspythonpydantic

Pydantic AI is easy to describe as another Python agent framework. That is technically true, but it misses the useful part. The project becomes interesting when you treat it as a way to make agent behavior inspectable: typed outputs, dependency injection, tool schemas, provider boundaries, traces, evals, and human approval points in one engineering surface.

The Doramagic pydantic-ai manual classifies it as an Agent SDK and runtime. The best fit is not "anyone who wants a chatbot." It is developers building observable, testable, multi-tool agent applications. The bad fit is also important: if you only need one prompt, a simple API call, or an environment where tool permissions cannot be isolated, Pydantic AI may be more machinery than you need.

My first rule for adopting it would be simple: do not start by asking the agent to do a large real task. Start by proving that a minimal agent can run with fake tools, temporary dependencies, typed output, and no unexpected side effects.

The real object is a workflow contract

The value of Pydantic AI shows up when your agent has to do more than produce text. It is most useful when:

the result needs to validate against a Pydantic BaseModel;
runtime state has to be passed through a typed RunContext;
tool arguments must be schema-checked before execution;
provider-specific behavior needs to stay behind a clear adapter boundary;
traces need to show tool choices, retries, failures, and branches;
evals need to catch regressions before release;
some tool calls require approval before they execute.

That is a different problem from "make the model answer nicely." It is closer to "make the model-driven workflow reviewable."

Structured output is not just prettier JSON

The README example pattern with output_type=SupportOutput is easy to underestimate. It is not only a formatting trick. It turns the model response into a business object that the rest of the application can inspect.

For example:

class SupportOutput(BaseModel):
    support_advice: str
    block_card: bool
    risk: int = Field(ge=0, le=10)

Now block_card is not something the caller has to infer from a paragraph. It is a field that can be reviewed, logged, tested, and gated. If the output fails validation, the framework can feed that error back to the model and retry.

The boundary is just as important: a typed field is not a policy. If risk=8 triggers a real action, the team still needs to define what risk means, who can approve the next step, and which action is allowed.

Tool calls should start with fake tools

Pydantic AI tools are normal Python functions registered through decorators, with arguments validated through schema. That is good engineering surface area. It also means the first mistake can be expensive if the tool has real permissions.

My adoption sequence would be:

register one fake tool that returns a fixed value;
pass temporary dependencies through RunContext;
verify that the agent selects the expected tool;
verify that arguments match the intended schema;
only then replace the fake tool with a real implementation.

Do not give the first agent production API keys, write access, browser automation, or a broad filesystem view. The Doramagic boundary card says the first use should start with least privilege, a temporary environment, and rollback. That is the right default.

`RunContext` is a state channel, not a junk drawer

The typed dependency pattern is one of the cleanest parts of Pydantic AI. It lets tools and dynamic instructions access runtime state without global variables. But it can also become a quiet permission leak.

I would split dependencies into three buckets:

required state: current task, current user scope, allowed data range;
read-only configuration: model choice, thresholds, feature flags;
forbidden state: long-lived secrets, full user tables, production write clients, and unrelated internal documents.

If a tool needs a user id, pass a user id. Do not pass the entire account object. If it only needs read access, do not pass a write-capable client. A stronger agent framework makes state minimization more important, not less.

Trace should explain decisions, not dump everything

Pydantic AI's observability surface is one reason it is useful for serious agent work. For agents, the trace is often more important than the final sentence. A correct answer can still hide an unsafe path: the wrong tool, too many retries, weak retrieval, swallowed errors, or a side effect that should have waited for approval.

A useful trace should answer:

Which tool was selected?
Which context or source was used?
Where did the run retry, fail, or branch?
Did structured output validation fail and recover?
Was any side effect attempted?

But observability has a data boundary. The Doramagic manual calls out community concerns around large OpenTelemetry attributes such as serialized request parameters. That is a reminder to decide what enters traces, what gets redacted, and what never gets logged.

The goal is to preserve the decision path, not to archive sensitive prompts, private documents, secrets, or oversized context.

Toolsets, MCP, and capabilities need an allowlist

Pydantic AI's toolsets, MCP support, capabilities, deferred loading, and human-in-the-loop tools are powerful because they let an agent load external capabilities in a structured way. They are also exactly where teams should slow down.

Before enabling a capability, I would write a small allowlist:

which toolsets this agent may see;
which tools are disabled by default;
which tools require approval;
what happens when a tool fails;
whether tool output can enter the final answer;
where tool calls are audited.

If an agent can see every tool all the time, you do not have a capability system. You have a permission problem waiting for the wrong prompt.

A host rule I would load first

If I were asking Claude Code, Codex, Cursor, or another AI coding host to help with Pydantic AI, I would give it this rule before asking for code:

You may help design a Pydantic AI agent, but first state:
1. whether the target is chat, RAG, a tool-using agent, or a multi-agent workflow;
2. whether output needs a Pydantic BaseModel contract;
3. which state is allowed in deps and which state is forbidden;
4. each tool's permission level, input schema, failure behavior, and approval rule;
5. which trace fields are recorded, redacted, or excluded;
6. which smoke check or eval proves the agent did not cross its boundary.

Do not claim Pydantic AI is installed or working locally without a separate run log.
Do not use real secrets, production write access, or broad filesystem access unless the user explicitly approves it.
Do not treat a prompt preview as a real project run.

That rule is more useful than "build me an agent." It turns the task into a reviewable boundary contract.

A sane first day

A practical first day would be deliberately small:

create a temporary directory;
follow the official quick start in isolation;
define one Agent with one simple BaseModel output;
add one fake tool that returns a fixed value;
pass temporary dependencies through RunContext;
capture the tool path or trace;
write a smoke check that verifies the tool call, output validation, and no unexpected side effect.

This is not flashy. It is useful because it answers the first adoption question: can this framework make my agent behavior more inspectable?

Pydantic AI's strength is not that it makes agents feel magical. Its strength is that it gives agent work a shape: types, tools, runtime state, traces, evals, approval points, and stop conditions. That is the part worth loading into an AI host.

Reference roles

Upstream project: pydantic/pydantic-ai, the source for code, installation, release behavior, and API facts, https://github.com/pydantic/pydantic-ai
Doramagic project page: an independent capability asset for AI hosts, https://doramagic.ai/en/projects/pydantic-ai/
Doramagic manual: a practical reading path for agents, providers, structured outputs, toolsets, MCP, traces, evals, and pitfalls, https://doramagic.ai/en/projects/pydantic-ai/manual/

DEV Community

Do Not Treat Pydantic AI as an Agent Magic Layer

The real object is a workflow contract

Structured output is not just prettier JSON

Tool calls should start with fake tools

`RunContext` is a state channel, not a junk drawer

Trace should explain decisions, not dump everything

Toolsets, MCP, and capabilities need an allowlist

A host rule I would load first

A sane first day

Reference roles

Top comments (0)

The real object is a workflow contract

Structured output is not just prettier JSON

Tool calls should start with fake tools

RunContext is a state channel, not a junk drawer

Trace should explain decisions, not dump everything

Toolsets, MCP, and capabilities need an allowlist

A host rule I would load first

A sane first day

Reference roles

`RunContext` is a state channel, not a junk drawer