Quick question: how many API keys are in your .env right now just for AI coding tools?
If you use Claude Code (Anthropic key), Codex (OpenAI key), and Cursor (another OpenAI key) — that's three providers, three billing accounts, three rate limit systems, zero flexibility.
I built Lynkr to collapse all of that into one proxy.
What It Does
Claude Code ──┐
Codex CLI ────┤
Cursor ───────┤──→ Lynkr (localhost:8081) ──→ Any LLM Provider
Cline ────────┤
Continue ─────┤
LangChain ────┤
Vercel AI ────┘
Lynkr auto-detects which tool is connecting and speaks its language:
- Anthropic Messages API for Claude Code
- OpenAI Responses API for Codex CLI
- OpenAI Chat Completions for everything else (Cursor, Cline, Continue, KiloCode, LangChain, Vercel AI SDK, any OpenAI-compatible client)
Setup
npm install -g lynkr
lynkr start
Then configure each tool to point at Lynkr:
# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8081
# Codex CLI (~/.codex/config.toml)
# base_url = "http://localhost:8081/v1"
# Cursor
# Settings → Models → Base URL: http://localhost:8081/v1
# LangChain
# ChatOpenAI(base_url="http://localhost:8081/v1", api_key="sk-lynkr")
# Literally any OpenAI-compatible tool
# OPENAI_BASE_URL=http://localhost:8081/v1
All of them hit the same Lynkr instance. Same provider. Same routing. Same optimization.
12+ Backends
Pick your provider:
# Free (local)
MODEL_PROVIDER=ollama
# Cheap cloud
MODEL_PROVIDER=openrouter # 100+ models
MODEL_PROVIDER=deepseek # 1/10 Anthropic cost
MODEL_PROVIDER=zai # 1/7 Anthropic cost
# Enterprise cloud
MODEL_PROVIDER=bedrock # AWS, 100+ models
MODEL_PROVIDER=vertex # Google, Gemini 2.5
MODEL_PROVIDER=databricks # Claude Opus 4.6
Or mix them across complexity tiers:
TIER_SIMPLE=ollama:qwen2.5-coder
TIER_MEDIUM=openrouter:deepseek-r1
TIER_COMPLEX=databricks:claude-sonnet-4-5
TIER_REASONING=vertex:gemini-2.5-pro
Simple requests (rename a variable) → free local model.
Complex requests (refactor auth across 23 files) → top-tier cloud model.
The routing engine makes this decision automatically using 5-phase complexity analysis — including Graphify, which reads your actual codebase AST across 19 languages to detect high-risk changes.
For Agent Builders: LangChain, CrewAI, AutoGen
This is where Lynkr shines for automation. If you're building agents that make hundreds of LLM calls per pipeline run, most of those calls are simple (read a file, parse JSON, format output). Only a few require deep reasoning.
Without Lynkr: every call hits GPT-4o at $15/MTok. 200 calls × $0.03 = $6/run.
With Lynkr: 140 calls hit free Ollama, 40 hit OpenRouter ($0.005 each), 20 hit Databricks ($0.02 each). Total: $0.60/run. 90% savings.
# Nothing changes in your agent code
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:8081/v1",
api_key="sk-lynkr",
model="auto" # Lynkr routes based on complexity
)
# Your existing chains, agents, and tools work unchanged
agent_executor.invoke({"input": "Refactor the payment module"})
Token Compression Stack
On top of routing, every request passes through 7 optimization phases:
- Smart tool selection — only relevant tools sent
- Code Mode — 100+ tool defs → 4 meta-tools (96% reduction, saves 16,800 tokens/request)
- Distill — delta rendering via Jaccard similarity (60-80% savings)
- Prompt cache — SHA-256 keyed LRU
- Memory dedup — removes repeated context across turns
- History compression — sliding window with structural dedup
- Headroom sidecar — optional ML compression (47-92%)
Enterprise: Circuit Breakers, Telemetry, Hot-Reload
For teams running this in production:
# Health check
curl http://localhost:8081/health
# List all providers and models
curl http://localhost:8081/v1/providers
curl http://localhost:8081/v1/models
# Routing analytics
curl http://localhost:8081/v1/routing/stats
curl http://localhost:8081/v1/routing/accuracy
# Change config without restart
curl -X POST http://localhost:8081/v1/admin/reload
# Prometheus metrics
curl http://localhost:8081/metrics
Circuit breakers auto-detect provider failures. After 5 failed requests, incoming calls fail instantly instead of timing out. Half-open probes test recovery every 60 seconds. When 2 probes succeed, traffic resumes. No manual intervention.
Get Started
npm install -g lynkr && lynkr start
699 tests. Apache 2.0. Node.js only. Zero infrastructure.
GitHub: github.com/Fast-Editor/Lynkr
If you're managing multiple AI coding tools or building LLM-powered agents, Lynkr consolidates everything into one proxy with intelligent routing and real cost savings.
Star it if it helps. PRs welcome.
Top comments (0)