Lynkr

Posted on Jun 6 • Edited on Jun 7

I Benchmarked Lynkr Against LiteLLM on the Same Backends.

#ai #webdev #devops #opensource

I Benchmarked Lynkr Against LiteLLM on the Same Backends. Lynkr Was Cheaper for Tool-Heavy Workloads

Founder disclosure: I built Lynkr, so take this as a technical benchmark write-up, not a neutral industry report. The numbers below come from the same backend providers on both gateways.

If you're routing AI coding traffic through a gateway, just switching providers is not enough. The real savings come from reducing the tokens that ever reach the model in the first place.

I ran Lynkr and LiteLLM against the same backends — Ollama locally, Moonshot, and Azure OpenAI — across 9 scenarios. On the scenarios that actually look like agentic coding work, Lynkr was cheaper because it does three things before forwarding the request upstream: smart tool selection, TOON compression, and semantic caching.

The short version

Lynkr was measurably better on the cost-sensitive parts of the workload:

Smart tool selection: 53% fewer input tokens, 52% lower cost
TOON JSON compression: 87.6% fewer billed tokens on a large tool result, 50% lower cost
Semantic cache: 171ms cache-hit response vs 3,282ms on the repeat query path
Tier routing: escalated hard prompts to stronger models instead of blindly sending everything to the cheapest route

Area	Lynkr result	Why it mattered
Tool selection	53% fewer tokens	Removes irrelevant tool schemas
TOON compression	87.6% fewer tokens	Shrinks large JSON tool outputs
Semantic cache	171ms cache hit	Avoids repeat model calls
Tier routing	Escalates hard prompts	Doesn’t over-optimize for cheapest path

This matters if you're running Claude Code, Codex, Cursor, or similar agent workflows where tools, file reads, grep output, and repeated context dominate your token bill.

Setup

Same benchmark inputs, same providers, same request shape.

Machine: macOS on Apple Silicon
Lynkr: v9.3.2 on Node 20
LiteLLM: v1.87.1 on Python 3.12
Backends used: Ollama local, Moonshot, Azure OpenAI
Scenarios: 9 total across simple prompts, tools, history, cache, and routing

Each scenario sent the same HTTP request to both gateways at POST /v1/messages.

Where Lynkr wins

1) Smart tool selection

A lot of coding requests are read-only, but the model still gets handed the full tool universe: write, edit, bash, git, file ops, everything.

Lynkr classifies the request first and strips irrelevant tool schemas before forwarding upstream. So a read-only question does not pay to carry write-capable tools.

Benchmark setup: 14 tool definitions attached to every request, which is pretty realistic for a Claude Code or Cursor style session.

Lynkr: 959 billed input tokens, $0.0044
LiteLLM: 2,085 billed input tokens, $0.0091

Result: 53% fewer input tokens and 52% lower cost on the same model and prompt.

This is the kind of optimization that compounds because it happens before every downstream model call.

2) TOON compression for tool results

Tool-heavy workflows often blow up because of structured JSON, not because the user wrote a long prompt.

Lynkr's TOON path compresses large JSON payloads before they hit the provider. Plain text goes through unchanged. The useful effect is that file reads, grep arrays, tool traces, and other structured outputs stop dominating the request.

Benchmark setup: a Bash tool returning 60 grep results as a JSON array, roughly 3,400 tokens unoptimized.

Lynkr: 427 billed input tokens, $0.009, 12s latency
LiteLLM: 3,458 billed input tokens, $0.018, 12s latency

Result: 87.6% token reduction and 50% lower cost at the same latency.

That last part matters. This was not a tradeoff where cost improved because the request got slower. Compression happened in-process and the wall-clock result stayed flat.

3) Semantic cache

The easiest cheap request is the one that never reaches the model.

Lynkr computes embeddings for the incoming prompt and returns a cached response when a semantically similar request shows up again. In the benchmark, the second prompt was just a paraphrase of the first:

"Explain TCP vs UDP"
"What is the difference between TCP and UDP?"

Cold run vs cache hit

Lynkr cold: 2,857 tokens, 1,891ms
Lynkr cache hit: served from cache in 171ms
LiteLLM repeat path: 54 tokens, 3,282ms

The important part is not just token avoidance. The response time dropped from 1.9s to 171ms, about 11x faster.

For interactive tooling, that difference is felt immediately.

4) Tier routing that looks at complexity, not just price

LiteLLM has routing. But in this benchmark configuration it was using cost-based-routing, which means the gateway optimizes for cheap first.

That works for simple questions. It breaks when the prompt genuinely needs a stronger model.

Lynkr scores requests across 15 dimensions — token size, reasoning markers, code complexity, risk signals, and agentic traits — then routes automatically.

In the benchmark:

Simple prompt: "What does git stash do?"
- Lynkr routed to minimax-m2.5
- LiteLLM routed to local Ollama
Complex prompt: JWT vs cookies security analysis for a banking architecture
- Lynkr escalated to moonshot-v1-auto
- LiteLLM still sent it to local Ollama

That is the difference between "cheap by default" and "cheap when appropriate."

Why this benchmark matters more than a generic proxy comparison

A lot of gateway comparisons collapse into "who can talk to more providers." That is table stakes now.

The more important question is:

What does the gateway do to reduce spend before the request hits the model?

That is where Lynkr is different in practice.

It stacks three cost levers:

Tool pruning so irrelevant tool schemas do not ride along
TOON compression so large structured tool output stops inflating prompts
Semantic cache so repeated or near-repeated requests do not call the model again

Then it adds tier routing on top, so the remaining requests go to the right model for the job.

That stack is why the benchmark result is interesting. It is not just "Lynkr can route too." It is that Lynkr changes the size and shape of the request before routing even happens.

Cost projection at 100,000 requests/month

Using the large JSON tool-result test as a representative tool-heavy scenario:

LiteLLM: about $818/month
Lynkr: about $409/month

So on equal footing, same backend, same model class, Lynkr came out roughly 50% cheaper.

That is the distinction I'd care about if I were evaluating an LLM gateway for coding agents. Not whether the gateway has another provider adapter, but whether it reduces the number of tokens my provider ever sees.

What about Portkey?

Portkey is good at a different layer of the stack.

It is stronger on managed observability, prompt management, and governance. But this benchmark was not measuring dashboarding or policy UX. It was measuring request-path optimization.

On that axis, Lynkr is doing something Portkey does not really center on:

automatic complexity detection
semantic caching
token compression
drop-in routing for coding-tool workloads

So I would not frame this as "Portkey but cheaper." They solve different primary problems.

Important caveats

To keep this honest, there are a few things worth stating clearly.

1) This is not a neutral benchmark

I built Lynkr. So the burden is on me to be explicit about methodology and where the numbers come from.

2) LiteLLM can look cheaper in headline totals

If LiteLLM routes everything to a free local model, the raw total can look lower. But that is not the useful comparison.

The fair comparison is same backend, same prompt, same model class. On those apples-to-apples paths, Lynkr was cheaper because it sent fewer tokens upstream.

3) Lynkr adds system-level context

In this benchmark, Lynkr injected a system prompt with memory and agent instructions, which added about 2,800 tokens of overhead in some scenarios. That is why comparing estimated raw request size to billed tokens can be misleading.

The correct comparison is billed tokens between Lynkr and LiteLLM on the same scenario.

Who this is for

Lynkr is for teams running things like:

Claude Code
Codex
Cursor
Hermes
custom agents using an OpenAI-compatible endpoint

If your real problem is reducing spend on coding workflows without rewriting client-side integrations, the benchmark result is pretty simple:

Lynkr wins when the workload includes tools, structured outputs, repeated prompts, and mixed-complexity requests.

That is exactly what real coding-agent traffic looks like.

Reproducibility

The benchmark script is reproducible from the Lynkr repo root:

node benchmark-tier-routing.js

Versions used in this run:

Lynkr v9.3.2
LiteLLM v1.87.1

Final takeaway

If all you want is a gateway that forwards requests, Lynkr is not interesting.

If you want a gateway that makes coding traffic cheaper before it reaches the model, that is where Lynkr starts to separate.

The three levers that mattered in this benchmark were:

tool selection
TOON compression
semantic cache

And on top of that, tier routing kept the hard prompts from being sent to the wrong model just because it was cheaper.

If you want to dig into it, the repo is here:

GitHub: https://github.com/Fast-Editor/Lynkr

If you test it against your own coding workload, I would genuinely like to know where it holds up and where it doesn't.

DEV Community