Lynkr vs claude-code-router: Static Rules vs a Complexity Classifier

#opensource #ai #claude #devtools

Disclosure: I'm the author of Lynkr. claude-code-router is a genuinely good project that pioneered this category — this is a technical comparison of two different approaches, not a takedown. Where CCR is the better choice, I say so.

If you want to keep the Claude Code harness but route requests to other models, you have two main self-hosted options today: claude-code-router (CCR, ~35k stars, the incumbent) and Lynkr. They solve the same problem with fundamentally different architectures, and which one fits you depends on how much you want to configure versus delegate.

The core difference in one paragraph

CCR routes by scenario rules you write. It has slots — default, background, think, longContext (triggered above a token threshold), webSearch, image — and you assign a model to each. It's predictable, transparent, and entirely under your control.

Lynkr routes by scoring the request itself. Every request gets a 0–100 complexity score computed from 13 weighted dimensions — token count, technical keyword density, tool complexity, multi-step reasoning markers, conversation depth, ambiguity, and so on — and lands in a tier (SIMPLE/MEDIUM/COMPLEX/REASONING) you've mapped to models. You configure the tiers once; the classifier decides per-request.

Where CCR wins

Maturity and ecosystem. 35k stars, ~730k monthly npm downloads, 20+ provider transformers, custom JS plugins, a web UI, and in-session /model switching. If you hit a weird provider quirk, someone has already hit it.
Predictability. A rule is a rule. If you want "long contexts always go to Gemini", CCR expresses that in one line and never surprises you.
Claude Code specialization. CCR does one client deeply. Lynkr supports Claude Code, Cursor, Codex CLI, Cline, and Continue — breadth costs some depth.

Where the rule-based approach breaks down

Browse CCR's issue tracker (~1,000 open issues) and one complaint dominates: tool-calling breakage on downgraded models — failed file edits, broken git operations, agents going in circles. The root cause usually isn't CCR's code. It's that static rules can't see what the request needs:

A short prompt ("fix the auth bug in session.js") looks cheap by token count — but it's an agentic, tool-heavy task that a small local model will fumble.
A long context triggers the longContext rule — but if it's 60k tokens of grep output around a trivial question, an expensive long-context model is wasted money.

Token counts and scenario names are proxies. The thing you actually care about — can a cheap model handle this without breaking the session? — requires looking at the request's structure.

What Lynkr does differently

Three things, all absent from CCR by design (it aims to be a lean router):

1. The complexity classifier. Requests with agentic signals (write/edit/bash tool availability, prior tool results in the conversation, sequential-step language) score into higher tiers even when they're short. Trivia stays local even when the context is long. Force-patterns short-circuit both ways: greetings never hit the cloud; security-critical analysis never gets downgraded. The design goal is exactly the failure mode above — route down only when the answer will still work.

2. Token optimization on the wire. Lynkr strips tool schemas the request can't use (measured: 53% fewer tokens on a realistic 14-tool Claude Code request) and compresses large JSON tool results before they hit the model (measured: 3,458 → 427 tokens on a 60-match grep result). CCR forwards requests as-is.

3. Semantic caching. Paraphrased repeat questions are served from an embedding cache in ~171ms with zero tokens billed.

Honest comparison table

	claude-code-router	Lynkr
Routing logic	Scenario rules + token threshold	13-dimension complexity score → tiers
Configuration	Per-scenario, per-provider (flexible, verbose)	Pick 4 tier models via `lynkr init` wizard
Tool-schema stripping	No	Yes (−53% measured)
JSON tool-result compression	No	Yes (TOON + field stripping)
Semantic cache	No	Yes
Clients	Claude Code (deep)	Claude Code, Cursor, Codex CLI, Cline, Continue
Provider transformers/plugins	20+, custom JS	13 providers built-in
Ecosystem maturity	~35k stars, huge community	Young (~500 stars), one maintainer
In-session model switching	Yes (`/model`)	No (automatic per-request)
License	MIT	Apache-2.0

Which should you use?

You want explicit control and a battle-tested ecosystem → CCR. It's the safe default and its community is unmatched.
You're tired of tuning rules, or your cheap-model sessions keep breaking → try Lynkr. The classifier exists precisely because static rules degrade on agentic workloads.
Your bill is dominated by tool output and repeated context → Lynkr, regardless of routing preference; the compression and caching layers work even if you route everything to one model.

Both are self-hosted, free, and take five minutes to try. Run your own workload through each and compare the token logs — that's the only benchmark that matters. Mine are reproducible here: github.com/Fast-Editor/Lynkr.

Top comments (2)

Alex Shev • Jul 5

Static routing rules are attractive because they are easy to understand, but the complexity classifier idea maps better to real work. The hard part is making the classifier explainable enough to debug when it chooses wrong.

Dipankar Sarkar • Jul 5

The classifier-vs-rules split is real, but I would push on where the classifier's own errors land. A static rule that sends longContext to Gemini is wrong in a way you can see and fix in one line. A 13-dimension score that misreads a short-but-hard prompt as SIMPLE downgrades it silently, and you get exactly the tool-calling breakage you cite CCR for, except now the cause is buried in a weight you cannot grep.

So the question that decides this for me is not accuracy, it is observability. Do you expose the per-request score breakdown, and can I pin a request to a tier when the classifier gets it wrong? A rule is debuggable at 3am. A black-box score is not, unless you make it one.

The other thing: does the classifier learn from outcomes, a downgraded request that then failed and retried on a bigger model, or is it a fixed heuristic? If it is fixed you have not removed the tuning burden. You have moved it from config I can read into weights I cannot.