DEV Community

Vishal VeeraReddy
Vishal VeeraReddy

Posted on

One npm Install That Makes Every AI Coding Tool Work With Every LLM Provider

Quick question: how many API keys are in your .env right now just for AI coding tools?

If you use Claude Code (Anthropic key), Codex (OpenAI key), and Cursor (another OpenAI key) — that's three providers, three billing accounts, three rate limit systems, zero flexibility.

I built Lynkr to collapse all of that into one proxy.

What It Does

Claude Code ──┐
Codex CLI ────┤
Cursor ───────┤──→ Lynkr (localhost:8081) ──→ Any LLM Provider
Cline ────────┤
Continue ─────┤
LangChain ────┤
Vercel AI ────┘
Enter fullscreen mode Exit fullscreen mode

Lynkr auto-detects which tool is connecting and speaks its language:

  • Anthropic Messages API for Claude Code
  • OpenAI Responses API for Codex CLI
  • OpenAI Chat Completions for everything else (Cursor, Cline, Continue, KiloCode, LangChain, Vercel AI SDK, any OpenAI-compatible client)

Setup

npm install -g lynkr
lynkr start
Enter fullscreen mode Exit fullscreen mode

Then configure each tool to point at Lynkr:

# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8081

# Codex CLI (~/.codex/config.toml)
# base_url = "http://localhost:8081/v1"

# Cursor
# Settings → Models → Base URL: http://localhost:8081/v1

# LangChain
# ChatOpenAI(base_url="http://localhost:8081/v1", api_key="sk-lynkr")

# Literally any OpenAI-compatible tool
# OPENAI_BASE_URL=http://localhost:8081/v1
Enter fullscreen mode Exit fullscreen mode

All of them hit the same Lynkr instance. Same provider. Same routing. Same optimization.

12+ Backends

Pick your provider:

# Free (local)
MODEL_PROVIDER=ollama

# Cheap cloud
MODEL_PROVIDER=openrouter    # 100+ models
MODEL_PROVIDER=deepseek      # 1/10 Anthropic cost
MODEL_PROVIDER=zai           # 1/7 Anthropic cost

# Enterprise cloud
MODEL_PROVIDER=bedrock       # AWS, 100+ models
MODEL_PROVIDER=vertex        # Google, Gemini 2.5
MODEL_PROVIDER=databricks    # Claude Opus 4.6
Enter fullscreen mode Exit fullscreen mode

Or mix them across complexity tiers:

TIER_SIMPLE=ollama:qwen2.5-coder
TIER_MEDIUM=openrouter:deepseek-r1
TIER_COMPLEX=databricks:claude-sonnet-4-5
TIER_REASONING=vertex:gemini-2.5-pro
Enter fullscreen mode Exit fullscreen mode

Simple requests (rename a variable) → free local model.
Complex requests (refactor auth across 23 files) → top-tier cloud model.

The routing engine makes this decision automatically using 5-phase complexity analysis — including Graphify, which reads your actual codebase AST across 19 languages to detect high-risk changes.

For Agent Builders: LangChain, CrewAI, AutoGen

This is where Lynkr shines for automation. If you're building agents that make hundreds of LLM calls per pipeline run, most of those calls are simple (read a file, parse JSON, format output). Only a few require deep reasoning.

Without Lynkr: every call hits GPT-4o at $15/MTok. 200 calls × $0.03 = $6/run.

With Lynkr: 140 calls hit free Ollama, 40 hit OpenRouter ($0.005 each), 20 hit Databricks ($0.02 each). Total: $0.60/run. 90% savings.

# Nothing changes in your agent code
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:8081/v1",
    api_key="sk-lynkr",
    model="auto"  # Lynkr routes based on complexity
)

# Your existing chains, agents, and tools work unchanged
agent_executor.invoke({"input": "Refactor the payment module"})
Enter fullscreen mode Exit fullscreen mode

Token Compression Stack

On top of routing, every request passes through 7 optimization phases:

  1. Smart tool selection — only relevant tools sent
  2. Code Mode — 100+ tool defs → 4 meta-tools (96% reduction, saves 16,800 tokens/request)
  3. Distill — delta rendering via Jaccard similarity (60-80% savings)
  4. Prompt cache — SHA-256 keyed LRU
  5. Memory dedup — removes repeated context across turns
  6. History compression — sliding window with structural dedup
  7. Headroom sidecar — optional ML compression (47-92%)

Enterprise: Circuit Breakers, Telemetry, Hot-Reload

For teams running this in production:

# Health check
curl http://localhost:8081/health

# List all providers and models
curl http://localhost:8081/v1/providers
curl http://localhost:8081/v1/models

# Routing analytics
curl http://localhost:8081/v1/routing/stats
curl http://localhost:8081/v1/routing/accuracy

# Change config without restart
curl -X POST http://localhost:8081/v1/admin/reload

# Prometheus metrics
curl http://localhost:8081/metrics
Enter fullscreen mode Exit fullscreen mode

Circuit breakers auto-detect provider failures. After 5 failed requests, incoming calls fail instantly instead of timing out. Half-open probes test recovery every 60 seconds. When 2 probes succeed, traffic resumes. No manual intervention.

Get Started

npm install -g lynkr && lynkr start
Enter fullscreen mode Exit fullscreen mode

699 tests. Apache 2.0. Node.js only. Zero infrastructure.

GitHub: github.com/Fast-Editor/Lynkr

If you're managing multiple AI coding tools or building LLM-powered agents, Lynkr consolidates everything into one proxy with intelligent routing and real cost savings.

Star it if it helps. PRs welcome.

Top comments (0)