Lynkr

Posted on Jun 9

How to Make PydanticAI Agents Cheaper with Lynkr

#ai #devtools #python #opensource

PydanticAI is one of the cleanest ways to build structured LLM agents in Python. But once those agents start doing real work — tool calls, validation retries, structured outputs, and multi-step flows — the token bill climbs faster than most teams expect.

Lynkr fits underneath that stack as an LLM gateway. It does not replace PydanticAI. It makes the model layer under it cheaper and easier to control with tier routing, prompt caching, and provider flexibility.

Founder disclosure: I built Lynkr, so take that into account. I’ll keep this practical and focus on where the fit is real.

Why PydanticAI is compelling in the first place

I spent time going through PydanticAI because it solves a problem a lot of Python agent frameworks make messy: keeping agent code structured without giving up flexibility.

What stood out to me is that PydanticAI is built around the same things Python teams already care about in production:

typed agents
structured outputs
dependency injection
tool calling
model/provider flexibility
observability and eval-friendly workflows
graph support for more complex control flow

The repo positions it as a production-grade Python agent framework, and that shows up quickly in the design. The README emphasizes model-agnostic support across OpenAI, Anthropic, Gemini, Bedrock, Ollama, Groq, OpenRouter, LiteLLM, and more. It also leans heavily into typed outputs, MCP integration, durable execution, and validation-driven retries.

That combination makes PydanticAI attractive for teams that want agent workflows to feel more like real Python systems and less like prompt spaghetti.

Where the token spend starts to leak

The part that matters economically is not whether the framework is good. PydanticAI is good.

The problem is that good structure does not automatically mean cheap execution.

In practice, cost starts leaking in a few predictable places:

repeated system instructions across multiple runs
the same output schema getting sent over and over
validation failures triggering retries
tools being selected or called in multiple rounds
expensive models getting used for easy intermediate steps
long workflows carrying too much repeated context forward

PydanticAI’s strengths can actually make this more visible.

If you use typed outputs, the model may need another pass when validation fails.
If you use tools, there can be multiple model turns around those tools.
If you use graphs or longer agent flows, repeated context starts compounding.
If you keep one premium model as the default for everything, simple steps inherit premium-model pricing for no good reason.

None of that is a PydanticAI flaw. It is just what happens when a framework makes it easier to build richer agent workflows.

Where Lynkr fits

The right way to understand Lynkr here is simple:

PydanticAI stays the application layer
Lynkr becomes the gateway layer underneath it

That means your Python agent logic does not need to become a mess of provider-specific conditionals just to get better economics.

You keep using PydanticAI for:

agent structure
typed outputs
tools
graphs
retries
application logic

And you use Lynkr for:

model routing
prompt caching
provider switching
centralized cost control

That separation matters because most teams do not want to rebuild their agent code every time they want to try a cheaper provider, add routing, or move one class of requests off an expensive model.

1. Route easy turns to cheaper models

One of the easiest ways to overspend in agent systems is to treat every turn like frontier reasoning.

A lot of PydanticAI work is not actually frontier reasoning.

Examples:

classification before the main task
extraction from predictable text
tool selection
formatting into a structured schema
intermediate planning
low-risk follow-up steps after a strong first pass

Those steps often do not need the best model in your stack.

Lynkr helps by putting routing under the agent, so easier turns can go to cheaper models while harder turns still escalate when they need to.

That is a much better cost shape than paying premium-model rates for every structured substep just because the app has one default model configured.

2. Stop paying repeatedly for the same context

This is the biggest recurring waste pattern in real agent systems.

A PydanticAI workflow often reuses a lot of stable prompt material:

system instructions
output schemas
tool descriptions
dependency-derived context
conversation framing that barely changes between turns

If that prompt material is sent again and again, the system keeps paying for mostly the same input.

This is where Lynkr’s caching layer matters.

Instead of treating every call as fully fresh, the gateway can cut down repeated prompt spend underneath the workflow. That matters more as the workflow gets longer, as the schema gets larger, or as the tool surface grows.

For small toy demos, this does not matter much.
For real agent workloads, it matters a lot.

3. Keep the app stable while changing the economics

One reason teams tolerate waste for too long is that optimizing the stack usually means rewriting too much application code.

PydanticAI already gives you a clean framework for the agent logic. The useful part of Lynkr is that it lets you change the economics without ripping that logic apart.

That gives you room to:

compare providers more easily
reduce lock-in
shift easy steps to cheaper models
keep premium models for the parts that actually need them
centralize model behavior across multiple agent workflows

So the win is not just lower cost. It is lower cost without turning your Python codebase into provider-routing glue.

Example: structured extraction plus tools

A simple example makes the fit clearer.

Say you have a PydanticAI workflow that does this:

user submits messy unstructured text
agent extracts typed fields into a schema
validation fails on one field and triggers a retry
agent calls a tool to enrich one part of the result
final typed response is returned to the app

That is a perfectly reasonable workflow.

It is also exactly the kind of flow where hidden waste appears:

the schema is repeated
instructions are repeated
the retry adds another paid turn
the tool step adds more model interaction
the same premium model may be used for all five stages

Under Lynkr, that workflow can be made cheaper in the places that usually do not need the strongest model every time.

The extraction/classification layer can be routed down.
Repeated prompt material can be cached.
The harder step can still route up if needed.

That is the real value: not changing what the workflow does, but changing how expensively it gets there.

What the integration shape looks like

I am intentionally keeping this part conceptual instead of pretending exact config syntax from memory.

The practical setup is:

PydanticAI points to the Lynkr base URL
Lynkr handles provider and routing behavior underneath
your agent code stays mostly the same

That is the integration story that matters.

The point is not “replace your framework.”
The point is “keep your framework, improve the model layer under it.”

Where Lynkr does not replace framework-level discipline

This part matters because it is where a lot of gateway writing becomes dishonest.

Lynkr can cut model cost and make provider switching easier, but it does not fix a badly designed agent workflow.

If a PydanticAI app is looping too much, retrying too aggressively, or making unnecessary tool calls, those problems still exist. The gateway can reduce the price of those mistakes. It does not remove them.

What Lynkr helps with is the economics and control layer around the workflow:

route cheaper models to simpler steps
keep expensive models for the calls that actually need them
cache repeated work
avoid getting locked to one provider
standardize how requests move across providers

What it does not do on its own:

redesign weak prompts
stop bad retry logic
fix overly chatty agent graphs
choose the right tool boundaries for your app
replace evaluation and tracing discipline

That matters because a lot of agent cost does not come from one expensive call. It comes from repeated mediocre decisions across a workflow.

PydanticAI is useful because it gives structure to the application layer. Lynkr is useful because it gives control to the model-routing layer. They solve different problems, and they work better together than separately.

Who should care

PydanticAI + Lynkr is a strong fit if:

you are running a meaningful number of agent calls
you want structured workflows in Python
you care about typed outputs and tool use
your workflows retry or branch often enough for costs to become visible
you want provider flexibility without constantly changing application code

Closing thought

PydanticAI solves the structure problem well. Lynkr helps solve the economics problem underneath it.

If you are building typed Python agents and starting to notice that retries, tools, and repeated context are quietly inflating cost, this is a very practical combination to test.

GitHub: https://github.com/Fast-Editor/Lynkr

If you are already using PydanticAI, I’d be curious where the spend is showing up first in your workflow.