Lynkr

Posted on Jun 7

Run CrewAI With 50% Lower LLM Cost Using Lynkr

#ai #opensource #devops #tutorial

If you are building multi-agent systems in Python, CrewAI is one of the biggest frameworks you need to know.

And if your CrewAI workloads are starting to get expensive, the simplest way to control that spend is to put an LLM gateway in front of them instead of wiring every agent directly to one provider.

In this article, I’ll explain what CrewAI is, why it got so popular, and how to use it with Lynkr so your agents can run with better model routing, caching, and lower cost.

I built Lynkr, so that part comes with the obvious founder disclosure. Still, CrewAI is worth understanding on its own because it has become one of the main entry points for people building agent systems in Python.

What is CrewAI?

CrewAI is an open-source Python framework for orchestrating multiple AI agents.

At the time of writing, the GitHub repo has 53k stars.

The project describes itself as a:

Fast and Flexible Multi-Agent Automation Framework

Its core idea is simple:

define agents with roles and goals
define tasks
decide how they collaborate
run them as a system instead of a single prompt chain

That is the mental model behind the name CrewAI: not one agent, but a crew of specialized agents working together.

Why CrewAI matters

A lot of agent demos are still just one prompt plus one tool call.

CrewAI matters because it pushes people toward more structured systems:

researcher agent
writer agent
reviewer agent
planner agent
execution agent

Each one can have a different role, context, and tool setup.

That makes it useful for:

research pipelines
content workflows
internal business automation
data gathering + summarization flows
agent handoff patterns
more production-style orchestration than “just call the model again”

The reason it got traction is that it sits in a nice middle ground:

higher-level than wiring every agent loop yourself
more concrete than vague "agent platform" marketing
easy enough for Python developers to start with quickly

The two big concepts in CrewAI: Crews and Flows

From the current repo README, CrewAI emphasizes two core concepts.

1. Crews

Crews are teams of agents collaborating with autonomy.

This is the “multi-agent” part most people think of first:

specialized roles
role-based collaboration
delegation
agents working together toward a result

2. Flows

Flows are the more controlled, event-driven side.

This is where CrewAI becomes more production-friendly:

execution paths
state management
conditional logic
integration with normal Python code
more deterministic orchestration when you need it

That combination is a big part of the pitch:

Crews for agent autonomy
Flows for production control

Why CrewAI gets expensive fast

This part usually becomes obvious after the first real project.

A single-agent script is one thing.

A multi-agent system is different.

Costs grow because you now have:

multiple agents making separate LLM calls
handoffs between agents
intermediate summaries
retries
reflection/replanning
tool use across several steps
repeated context being passed around the system

So the problem is not just “what model am I using?”

It becomes:

do all agents need the same expensive model?
should the planner use the same model as the formatter?
how much repeated context is being resent?
can simple routing/classification work go to cheaper models?
can repeated flows benefit from cache hits?

That is exactly the kind of workload where a gateway layer starts making sense.

Where Lynkr fits

If CrewAI is the orchestration layer, Lynkr can sit underneath it as the LLM gateway.

That means your architecture becomes:

CrewAI agents / flows
        ↓
      Lynkr
        ↓
Ollama / OpenRouter / Bedrock / OpenAI / Azure / Databricks / others

Instead of wiring each agent stack directly to one provider, you point your model traffic at one gateway endpoint and let that layer decide what happens next.

Why use Lynkr with CrewAI?

This is the important part.

The real benefit is not just “use any provider.”

That is table stakes now.

The better reason is that Lynkr gives you three strong levers for agent workloads:

1. Prompt caching

Multi-agent systems resend a lot of context.

That can include:

system prompts
task descriptions
agent roles and backstories
previous step context
the same instructions reused across repeated runs

Lynkr’s caching layer helps reduce the amount of repeated input you pay for.

For agent systems, that matters a lot more than it does in one-off chat prompts.

2. Tier routing

Not every step in a CrewAI workflow deserves your strongest model.

Examples:

Use a cheaper/faster model for:

classification
routing
formatting
deterministic transformation
simple extraction
narrow sub-tasks

Use a stronger model for:

planning
reasoning-heavy synthesis
ambiguous task decomposition
final high-stakes output

This is exactly what tier routing is for.

3. One stable model endpoint

Once your agents grow from a prototype into a system, you usually want:

one model boundary
one place to switch providers
one place to add failover
one place to add policy and cost control

That is what a gateway layer gives you.

What Lynkr says it does well today

From the current Lynkr README, the main cost/performance claims are:

53% fewer tokens on tool-heavy requests
87.6% compression on large JSON tool results
171ms semantic cache hits
automatic tier routing
zero code changes at the client boundary once the endpoint is swapped

Those numbers come from coding-tool workloads, not specifically a published CrewAI benchmark.

So the honest framing is:

I am not claiming a public CrewAI benchmark showing exactly 50% lower cost on every workload
I am saying CrewAI has the exact kind of multi-step agent workload where these levers matter most

That is why “50% lower cost” is a fair headline shape for the category, but the actual result will depend on how your CrewAI system is built.

How to get started with CrewAI

From the current CrewAI README, installation starts like this:

uv pip install crewai

If you also want the tools extras:

uv pip install 'crewai[tools]'

The project also provides a CLI starter for creating a new crew project:

crewai create crew <project_name>

That scaffolds a project with:

main.py
crew.py
agents.yaml
tasks.yaml
.env

So CrewAI is designed to be used as a real project structure, not just a single script.

A simple mental model for CrewAI code

A better way to think about CrewAI is:

define who each agent is
define what each task needs done
define how work moves between agents
then execute the whole workflow as one coordinated system

That is the real shift from a normal single-agent app.

You are not just prompting one model repeatedly.
You are designing a small working system with roles, handoffs, and outputs.

A minimal conceptual example looks like:

from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Find the best information on a topic",
    backstory="You are great at gathering relevant details"
)

writer = Agent(
    role="Writer",
    goal="Turn research into a clear output",
    backstory="You write concise, structured summaries"
)

research_task = Task(
    description="Research the latest browser agent frameworks",
    agent=researcher
)

write_task = Task(
    description="Write a short technical summary from the research",
    agent=writer
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task]
)

result = crew.kickoff()
print(result)

That is not copied from their exact starter file, but it reflects the basic CrewAI model:

roles
tasks
orchestration

How to use CrewAI with Lynkr

The practical pattern is straightforward:

install CrewAI
install and start Lynkr
point the model calls used by your CrewAI stack at Lynkr instead of directly at one provider
let Lynkr handle routing/caching/provider flexibility underneath

1. Install Lynkr

npm install -g lynkr

2. Configure Lynkr

A simple cloud-backed setup from the current Lynkr README looks like this:

# .env
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=your-key
FALLBACK_ENABLED=false
PORT=8081
PROMPT_CACHE_ENABLED=true
SEMANTIC_CACHE_ENABLED=true

Then start Lynkr:

lynkr start

If you want local-first testing, Lynkr also supports local backends like:

Ollama
llama.cpp
LM Studio

That is useful for CrewAI because some low-value steps can run cheaply or locally, while harder reasoning tasks can still escalate.

3. Route CrewAI’s model traffic through Lynkr

The exact code depends on which model client you use with CrewAI.

The architecture is the important part:

CrewAI model client → Lynkr base URL → actual provider(s)

Because Lynkr gives you an OpenAI-compatible gateway surface, the integration is most natural when your CrewAI model configuration can target an OpenAI-style endpoint.

That lets you keep CrewAI as the orchestration layer while Lynkr becomes the control plane for model choice and cost behavior.

A better way to think about model assignment in CrewAI

Here is where most teams leave money on the table.

They do this:

planner agent → expensive model
researcher agent → same expensive model
formatter agent → same expensive model
reviewer agent → same expensive model

That is easy, but wasteful.

A better shape is:

planner → strong reasoning model
researcher → medium model
summarizer → medium or cheap model
formatter → cheap model
repeated workflows → cached through gateway

The point is not that every step should be cheap.

The point is that different agent roles have different model requirements.

CrewAI already encourages role specialization.

Lynkr makes it easier to pair that with cost specialization.

A concrete example

Imagine a CrewAI workflow for market research.

You have:

one agent gathering raw sources
one agent extracting facts
one agent writing the report
one agent reviewing for quality

Without a gateway, teams often default to one premium model for all four.

With Lynkr underneath, the better pattern is:

gather/extract → cheaper tier
writing → medium tier
review/final reasoning → stronger tier
repeated report skeleton/context → cache where possible

That is a much more rational cost shape.

Why this matters more for CrewAI than normal apps

A normal app may only hit the LLM a few times.

A CrewAI system can explode the number of calls because the framework is designed around multiple agents and structured orchestration.

So the value of a gateway grows with:

number of agents
number of task handoffs
amount of repeated context
number of production runs
number of providers you want to evaluate

That is why CrewAI is such a good fit for the “put a gateway underneath it” pattern.

What Lynkr does not replace

Important distinction:

CrewAI is still the orchestration framework
Lynkr is still the LLM gateway

Lynkr does not replace CrewAI’s agent/task/flow model.

It complements it by making the model layer cheaper and more flexible.

Honest tradeoffs

It is worth being direct here.

A gateway adds another infrastructure layer.

That is worth it when:

you have multiple agents
you care about spend
you want provider flexibility
you are moving toward production usage

It may not be worth it when:

you are just learning CrewAI
you are running a toy example once
simplicity matters more than control

So I would not tell every beginner to add a gateway on day one.

But once your CrewAI project becomes real, the gateway question shows up quickly.

Final take

CrewAI is one of the most important open-source frameworks in the multi-agent Python ecosystem right now.

It gives you a useful structure for building agent systems with:

roles
tasks
crews
flows
production-style orchestration

And if those systems are getting expensive, Lynkr is a practical way to put a cost-and-routing layer underneath them.

That gives you:

one stable model endpoint
provider flexibility
caching for repeated context
tier routing for different agent roles
a better chance of keeping multi-agent systems affordable as they scale

If you want to try the stack:

CrewAI: https://github.com/crewAIInc/crewAI
Lynkr: https://github.com/Fast-Editor/Lynkr

If you are already running CrewAI in production, I think the right question is not:

“What is the best model?”

It is:

“Which parts of my agent system actually deserve the expensive model?”

DEV Community